UniqueIdentifier as a Primary Key

  • Is it ever considered a best practice to use a UniqueIdentifier(GUID) as a primary key?

    Is there ever a case where you would want to use a GUID as a PK?

    I watched Paul Randals excellent Myth #6 on his Pluralsight's Myths and Misconceptions course and he talks about using the NewSequentialID() function to assign a sequential GUID for clustered indexes, to avoid page framentation and maximize page density.

    There are alot of opinions, I just wondered what you all thought of GUIDs as PK.

  • There is NO case where I would ever want to use a GUID as a clustered key column.

    It would have to be THE ONLY WAY POSSIBLE for me to consider it.

    The performance damage is just way too severe.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • What prompted me to question this was my coworker said it was a best practice when showing me a database design. I remembered Paul's video from an earlier viewing but I didn't remember the details. So, I viewed it again and did the demo myself so I saw his point.

    Is there a white paper or some reference material that states this as fact? Not that I don't believe you, and given what I saw in Paul's video and other blogs, no, I would not use a GUID as a PK. I just need to help someone understand this at work.

  • alanspeckman (9/12/2012)


    What prompted me to question this was my coworker said it was a best practice when showing me a database design. I remembered Paul's video from an earlier viewing but I didn't remember the details. So, I viewed it again and did the demo myself so I saw his point.

    Is there a white paper or some reference material that states this as fact? Not that I don't believe you, and given what I saw in Paul's video and other blogs, no, I would not use a GUID as a PK. I just need to help someone understand this at work.

    Books Online says the following about NEWSEQUENTIALID, which might make it tempting to use...

    Creates a GUID that is greater than any GUID previously generated by this function on a specified computer since Windows was started. After restarting Windows, the GUID can start again from a lower range, but is still globally unique. When a GUID column is used as a row identifier, using NEWSEQUENTIALID can be faster than using the NEWID function. This is because the NEWID function causes random activity and uses fewer cached data pages. Using NEWSEQUENTIALID also helps to completely fill the data and index pages.

    ... but look at that again. If you ever need to bounce the machine, "the GUID can start againn from a lower range". That's not a good thing to do with a clustered index which is what most PK's end up being.

    Also, something to be aware of... Books Online is actually incorrect about GUIDs being globally unique (and MS has admitted that fact although I'm on the wrong machine right now to be able to provide the link). While the probability of running across duplicate GUIDs across multiple machines is very slim (and I do mean incedibly slim), GUID's in SQL Server are no longer guaranteed to be unique if more than one machine is involved. SQL Server now uses Type 4 GUIDs which are nothing more than pseudo-random numbers and there's no guarantee that two machines can't generate the same number. The old Type 1 GUIDs that SQL Server used to generate were guaranteed to be unique even between machines provided that things like the MAC address (IIRC) for the machine was different.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • alanspeckman (9/12/2012)


    Is it ever considered a best practice to use a UniqueIdentifier(GUID) as a primary key?

    ...

    Nowadays, it's not considered as the best practice. However, there are cases where the use of GUID's for PK is justified. Please note, usually in these cases other columns/keys would be selected for table clustered index (PK is not clustered).

    Old days, back when for insert operation Sybase/SQLServer used to lock whole pages, it was a practice sometime to use GUID's (or other random numbers) for PK and having clustered index on it for tables which were subject for often simultaneous inserts from concurrent users (lets say Call Centre systems for example).

    Also, note the Jeff M post, GUID's generated by SQL have higher probability of re-occurrence when generated by two different machines as they now Type-2 instead of Type-1. That will give you an idea where is justifiable to have GUID's as PK - it's only now viable option if you really need to generate key in application layer. There you can still generate type-1 GUIDs.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • I appreciate your experience and knowledgable answers on this somewhat minor subject I raised here. In scanning blogs/forums, I see on other topics, like using special characters in your naming standards or not, that opinions can vary.

    I will look into BOL for the text. I'm just wondering if there is a more athoritative source that mainly says what not to do, from acedemia, the SQL Team, or even Oracle.

  • ScottPletcher (9/12/2012)


    There is NO case where I would ever want to use a GUID as a clustered key column.

    It would have to be THE ONLY WAY POSSIBLE for me to consider it.

    The performance damage is just way too severe.

    Quite right too - but would you use a GUID as a PK?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Jeff Moden (9/12/2012)


    alanspeckman (9/12/2012)


    What prompted me to question this was my coworker said it was a best practice when showing me a database design. I remembered Paul's video from an earlier viewing but I didn't remember the details. So, I viewed it again and did the demo myself so I saw his point.

    Is there a white paper or some reference material that states this as fact? Not that I don't believe you, and given what I saw in Paul's video and other blogs, no, I would not use a GUID as a PK. I just need to help someone understand this at work.

    Books Online says the following about NEWSEQUENTIALID, which might make it tempting to use...

    Creates a GUID that is greater than any GUID previously generated by this function on a specified computer since Windows was started. After restarting Windows, the GUID can start again from a lower range, but is still globally unique. When a GUID column is used as a row identifier, using NEWSEQUENTIALID can be faster than using the NEWID function. This is because the NEWID function causes random activity and uses fewer cached data pages. Using NEWSEQUENTIALID also helps to completely fill the data and index pages.

    ... but look at that again. If you ever need to bounce the machine, "the GUID can start againn from a lower range". That's not a good thing to do with a clustered index which is what most PK's end up being.

    Also, something to be aware of... Books Online is actually incorrect about GUIDs being globally unique (and MS has admitted that fact although I'm on the wrong machine right now to be able to provide the link). While the probability of running across duplicate GUIDs across multiple machines is very slim (and I do mean incedibly slim), GUID's in SQL Server are no longer guaranteed to be unique if more than one machine is involved. SQL Server now uses Type 4 GUIDs which are nothing more than pseudo-random numbers and there's no guarantee that two machines can't generate the same number. The old Type 1 GUIDs that SQL Server used to generate were guaranteed to be unique even between machines provided that things like the MAC address (IIRC) for the machine was different.

    A little more information regarding NEWSEQUENTIALID() from BOL (MSDN version):

    You can use NEWSEQUENTIALID() to generate GUIDs to reduce page contention at the leaf level of indexes.

    Each GUID generated by using NEWSEQUENTIALID() is unique on that computer. GUIDs generated by using NEWSEQUENTIALID() are unique across multiple computers only if the source computer has a network card.

  • ChrisM@Work (9/13/2012)


    ScottPletcher (9/12/2012)


    There is NO case where I would ever want to use a GUID as a clustered key column.

    It would have to be THE ONLY WAY POSSIBLE for me to consider it.

    The performance damage is just way too severe.

    Quite right too - but would you use a GUID as a PK?

    Not if I can avoid it.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (9/13/2012)


    ChrisM@Work (9/13/2012)


    ScottPletcher (9/12/2012)


    There is NO case where I would ever want to use a GUID as a clustered key column.

    It would have to be THE ONLY WAY POSSIBLE for me to consider it.

    The performance damage is just way too severe.

    Quite right too - but would you use a GUID as a PK?

    Not if I can avoid it.

    Exactly.

    Almost always avoidable. For example, assign each site a unique code. Then the unique site code + unique sequence from that site will always be unique. Just remember that the site code in the value IS JUST TO MAKE THE VALUE UNIQUE, NOT to tell you what site it originated in or resides at; that type of information should be additional columns, just like all the other data.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Eugene Elutin (9/13/2012)


    alanspeckman (9/12/2012)


    Is it ever considered a best practice to use a UniqueIdentifier(GUID) as a primary key?

    ...

    Nowadays, it's not considered as the best practice. However, there are cases where the use of GUID's for PK is justified. Please note, usually in these cases other columns/keys would be selected for table clustered index (PK is not clustered).

    Old days, back when for insert operation Sybase/SQLServer used to lock whole pages, it was a practice sometime to use GUID's (or other random numbers) for PK and having clustered index on it for tables which were subject for often simultaneous inserts from concurrent users (lets say Call Centre systems for example).

    Also, note the Jeff M post, GUID's generated by SQL have higher probability of re-occurrence when generated by two different machines as they now Type-2 instead of Type-1. That will give you an idea where is justifiable to have GUID's as PK - it's only now viable option if you really need to generate key in application layer. There you can still generate type-1 GUIDs.

    Possible small correction... You meant "Type-4", not "Type-2" right?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/12/2012)

    Also, something to be aware of... Books Online is actually incorrect about GUIDs being globally unique (and MS has admitted that fact although I'm on the wrong machine right now to be able to provide the link). While the probability of running across duplicate GUIDs across multiple machines is very slim (and I do mean incedibly slim), GUID's in SQL Server are no longer guaranteed to be unique if more than one machine is involved. SQL Server now uses Type 4 GUIDs which are nothing more than pseudo-random numbers and there's no guarantee that two machines can't generate the same number. The old Type 1 GUIDs that SQL Server used to generate were guaranteed to be unique even between machines provided that things like the MAC address (IIRC) for the machine was different.

    I remember doing a bit of research on this and the odds are just so silly, it's not really a consideration that anyone working with SQL Server or computer programming in general should sanely be worried about. Similar or higher rates of failure exist in memory, CPU, disk and network checksum/error correction routines - nothing is guaranteed to such stratospheric levels of probability in computing when you look at the system as a whole.

  • HowardW (9/13/2012)


    Jeff Moden (9/12/2012)

    Also, something to be aware of... Books Online is actually incorrect about GUIDs being globally unique (and MS has admitted that fact although I'm on the wrong machine right now to be able to provide the link). While the probability of running across duplicate GUIDs across multiple machines is very slim (and I do mean incedibly slim), GUID's in SQL Server are no longer guaranteed to be unique if more than one machine is involved. SQL Server now uses Type 4 GUIDs which are nothing more than pseudo-random numbers and there's no guarantee that two machines can't generate the same number. The old Type 1 GUIDs that SQL Server used to generate were guaranteed to be unique even between machines provided that things like the MAC address (IIRC) for the machine was different.

    I remember doing a bit of research on this and the odds are just so silly, it's not really a consideration that anyone working with SQL Server or computer programming in general should sanely be worried about. Similar or higher rates of failure exist in memory, CPU, disk and network checksum/error correction routines - nothing is guaranteed to such stratospheric levels of probability in computing when you look at the system as a whole.

    I agree. There are enough distinct values where if one of thhe values was a mile, the toal number of values would be about the same as the number of miles across 14 quadrillion Milky Way galaxies. That notwithstanding and I can't prove it because there's no documentation on hand but an old DBA friend of mine had a friend that worked at UPS and described when such a thing supposedly really happened.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/12/2012)


    alanspeckman (9/12/2012)


    What prompted me to question this was my coworker said it was a best practice when showing me a database design. I remembered Paul's video from an earlier viewing but I didn't remember the details. So, I viewed it again and did the demo myself so I saw his point.

    Is there a white paper or some reference material that states this as fact? Not that I don't believe you, and given what I saw in Paul's video and other blogs, no, I would not use a GUID as a PK. I just need to help someone understand this at work.

    Books Online says the following about NEWSEQUENTIALID, which might make it tempting to use...

    Creates a GUID that is greater than any GUID previously generated by this function on a specified computer since Windows was started. After restarting Windows, the GUID can start again from a lower range, but is still globally unique. When a GUID column is used as a row identifier, using NEWSEQUENTIALID can be faster than using the NEWID function. This is because the NEWID function causes random activity and uses fewer cached data pages. Using NEWSEQUENTIALID also helps to completely fill the data and index pages.

    ... but look at that again. If you ever need to bounce the machine, "the GUID can start againn from a lower range". That's not a good thing to do with a clustered index which is what most PK's end up being.

    Also, something to be aware of... Books Online is actually incorrect about GUIDs being globally unique (and MS has admitted that fact although I'm on the wrong machine right now to be able to provide the link). While the probability of running across duplicate GUIDs across multiple machines is very slim (and I do mean incedibly slim), GUID's in SQL Server are no longer guaranteed to be unique if more than one machine is involved. SQL Server now uses Type 4 GUIDs which are nothing more than pseudo-random numbers and there's no guarantee that two machines can't generate the same number. The old Type 1 GUIDs that SQL Server used to generate were guaranteed to be unique even between machines provided that things like the MAC address (IIRC) for the machine was different.

    Hi All,

    The above from BOL has proved true in a test i did some weeks ago. I wanted to see the fragmentation, page counts, index sizes for int, bigint, newid() and newsequentialid() keys.

    Here are the results:

    Table Ind_level External Fragm(%)Avg Frag Size Page_count Internal Fragm(%) Ind_size (KB)

    identity_table00.369160869242.862069704399.9647022556344

    newid_table099.271373061.01235268.9911045298816

    newsequentialid_table00.678521292140.1311475854899.7042747768384

    identity_bigint_table00.475184794199.3684211757699.4542006460608

    You can see which key has best characteristics.

    Regards

    IgorMi

    Igor Micev,My blog: www.igormicev.com

  • Jeff Moden (9/13/2012)


    I agree. There are enough distinct values where if one of thhe values was a mile, the toal number of values would be about the same as the number of miles across 14 quadrillion Milky Way galaxies. That notwithstanding and I can't prove it because there's no documentation on hand but an old DBA friend of mine had a friend that worked at UPS and described when such a thing supposedly really happened.

    There are 3.4 × 10^38 possible outputs, so even that's understated. If one distinct value was 1 centimetre, all the distinct values would equal 3.6 quadrillion (million billion) Milky Way Galaxies (assuming 100,000 light years for the size of the Milky Way).

    The maths behind the probability of collision is interesting. It's actually much more likely than intuition would tell you for collision to occur for a random distribution of a range of numbers. It's generally called the birthday paradox/problem due to the most common analogy of only needing to have 23 people in a room to have a 50% probability of two sharing the same birthday (based on 365 distinct values). That's why you have to have such a gratuitously large number in the first place.

    There's a grid in that article which shows the probability of collision by hash size. You can see for 128 bit (same as Uniqueidentifier), if you were conservative and said the average company would collect 26 billion rows of data with GUIDs throughout an IT development's lifetime, the chance of any one company hitting a single row of collision throughout the lifetime of the table would be 1 in 1000 quadrillion.

    Obviously other factors can massively skew this - e.g. the algorithm used to generate them, code errors, transmission/storage errors and most importantly uncorrectable error rates between chair and keyboard 😉

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply