A Look at GUIDs

  • If you are writing cross platform code then using GUID's could certainly be useful. Different RDBMS's implement Identity columns in different ways. Returning Indentity values has always been a bit of a pain. I can see advantages of using GUID's particularly if you don't like using stored procedures and keep your logic in the middle tier.

  • Not a bad article Andy. I learnt some stuff I didn't know before.

    Having said that, the article was a "How" to use them. I would really likemto see a "Why" I should use them Specifically, why would i use them rather than generate sequential IDs (which is what I do currently - I don't use IDENTITY).

    -Jamie

     

  • I never use GUID's as primary keys.  However, I have found them very useful in the following scenario.  Because the generated numbers are essentially random and unique, whenever I need to randomly select a number of records from a group, I assign a GUID to each row.  Assume a table with 1000 records and I want to select 100 at random.  I can assign a GUID to each row, and then select the top 100 ordering by the GUID.  Because the generated GUID's are random, I know that the rows that get selected will be sufficiently randomized.  This has worked well for me in the past.

    Jonathan

  • There always is the risk that GUIDs are only unique if the network cards generating them have unique ids. They're supposed to but not if they're clones!

    The scenario of generating ids on the client without the need for round trips is addressed more easily by giving each client an unique integer id (I assume that you know who the clients are - not just anonymous access) and concatenate that with the client's internally generated unique integer ids for each object. That gives, at worst, 8 bytes instead of 16, provides an audit trail identifying the originating client and is human readable.

    I see human readability as a major disadvantage of GUIDs - they make debugging and issue investigation a nightmare!

  • GUIDs also add a level of security, as was noted in the article, which is especially important in government and military developed applications. I develop a number of security-related and classified web and client/server applications, and using a GUID as the primary key helps prevent users from seeing records that are not theirs. If we exposed an Identity PK that had values 1 and 4 for a specific user, it doesn't take a rocket scientist to figure out that there are probably records with IDs of 2 and 3 too. Depending on how securely written the app is, it may or may not be easy to get at those records. Very difficult to "guess" like this when using GUIDs. Despite the performance hits a GUID may come with, we find that the benefits of GUIDs from a security point are worth it. This logic probably has merits in other arenas too, especially e-commerce and financial-related systems. My 2 cents... Steve

  • I take your point but surely this is the very last line of defence. All keys should be internal to the app / db and invisible to the user who should not have any direct access to the tables anyway. All comms, of course, should be encrypted for virtually any system.

  • Excellent, factual article Andy.

    I started using GUIDs as Primary Keys for the very reasons mentioned in your article several years ago (SQL 7 also has NEWID as a function).  My criteria was

    1) Being able to generate the PK from the client and passing it to associated recordsets long before uploading the batch to the server.  This was really big for us, our clients are typically on a slow connection, a long ways from the server.

    2) Designing the database ready for replication, where a GUID is going to get added to the table anyway (This really kills the 16 byte versus 4 byte argument, if you use replication you'll end up with both if you start with the Identity field).

    3) Deciding that the performance issue with page splits was pretty bogus for our particular usage patterns.  Few folks ever mention that using GUIDs for PKs solves another problem at the same time.  If you use Identity as your PK and have a lot of inserts, you end up with one hot spot on the hard disk where all the activity is taking place, with everyone needing to update the same page.  With GUIDs, the activity gets spread around a bit.  In my experiments (six years ago), I got higher insert rates with GUIDs than with Identity values.

    The purists (you know who you are) always trot out the "a good primary key should be a natural key" argument.  I just don't care.  It is really nice to be able to do every join on a 1 to 1 FK to PK basis.  It is really nice to know that no user, no manager, no third party is EVER going to need to change a PK because somewhere, someone allowed a PK with a value that has human readable significance.

     


    Student of SQL and Golf, Master of Neither

  • One reason not to use integer PKs and instead use GUIDs is if the table is potentially going to exceed 4 billion records  (for instance, the history table to your primary transaction table).  A 4 byte integer holds values between -2,147,483,648 to +2,147,483,647, so you are in trouble if any table exceeds 4.2 billion records. 

  • Not if you use bigint, which is 8 bytes.  Then you can have more than 18 billion billion records in your table.

    John

  • Probably u don't need to add GUID column for each table to get random sets.

     

    u a can achieve it by adding NEWID() in the ORDER clause.

    . . .

  • Thanks for a great article, Andy. I am in the process of designing a client/server system (.NET and SQL Server 2000/2005) in which GUIDs are used as clustered primary keys throughout the database in order to reduce the number of round trips between client and server. But then I found an article by Zach Nichter, which gave me second thoughts:

    http://www.sql-server-performance.com/zn_guid_performance.asp

    It seems that the two of you basically agree on the downsides of using uniqueidentifiers as clustered primary keys. But it seems that while Zach's conclusion is that this should be avoided, you are much more positive. Is this correct? And does this mean that I shouldn't redesign my system after all? I guess the answer is "it depends", but that's OK

  • I don't add to each real table.  I only add them to table variables when I need to do additional work on the data.  The ORDER BY clause would work too.

  • Nice artical. We had an argue about using GUID or int as a key, and this artical clarifies some concerns.

    I found a confusing part, in the artical you say "as noted in BOL because of the way they are calculated, it is possible someone could guess the next key." I guess you intended to mean "it is NOT possible someone could guess the next key.", correct?

     

  • NewSequentialId was interesting, but not being able to call it directly was a non-starter for us.

    I found 2 implementations for NewSequentialID in an extended stored procedure, so you could use it in SQL 2000: xpguid and yaflaguid (the later includes source code, it also has an implementation of an alternate algorithm)

    This is an email I originally sent to some coworkers, with some links for more information.

    MSDN NewSequential ID Documentation

    I came across this new SQL 2005 function - NewSequentialID() -, and I thought you might be interested.

    Microsoft’s first implementation of Universally Unique IDs (GUIDs) used the network card MAC address, which is unique by itself, plus some randomized and counter data. However, there was some uproar about it infringing on privacy: it was first used in Office as internal document structure identifiers, and it was possible to determine the originating computer based on the ID (since it embedded the 6 byte MAC address). MS changed the default algorithm so you wouldn’t use the MAC address (I think it still uses it, but only as an aid, and you couldn’t reconstruct it from it). NewSequentialID()/UuidCreateSequential go back to using the MAC address for a simplified algorithm with some advantages on a DB design (ie, clustering on the primary key)

    http://blogs.msdn.com/sqlprogrammability/archive/2006/03/23/559061.aspx

    Blog entry describing the SQL developer’s logic and some notes on the implementation of NewSequentialID() by calling UuidCreateSequential. It’s interesting that they saw duplicate GUID generation on AMD 64 systems (and that they fixed it, albeit using a mutex). This is probably the reason why you can't call it directly.

    http://scrappydog.com/blogs/blog/archive/2005/11/14/9380.aspx

    A blog entry indicating that duplicates are being generated. However, his interpretation of the manual kind of makes me think that he might have some other issues. One of the posts kind of implies that there might be an issue with uniqueness on a cluster. You probably want to research it further if you want to use this function.

    http://www.sqlmag.com/Article/ArticleID/50164/sql_server_50164.html

    SQL Server Magazine article about it (if you have a subscription).

    http://www.sqljunkies.com/WebLog/odds_and_ends/archive/2005/09/06/16664.aspx#comments

    Someone that sat down to test performance of a GUID vs integers. The conclusion was that if you design properly, there is little or no difference in performance.

    What no one seems to have stated is that (I think) the reason why a GUID is so close to an integer in performance is because the CPU word size has gotten bigger, and because the CPU is routinely waiting for other subsystems, and because the extra overhead involved in using data types smaller than the CPU word size more than compensates for the GUIDs large size. This is even truer now with 64 bits CPU (which is kind of why I initially justified GUIDs to myself when the first details on the Itanium where available).

    http://www.informit.com/articles/article.asp?p=25862&redir=1&rl=1

    More performance testing, and another alternative to random GUIDs, COMBs – using dates combined with a guid to generate an ordered GUID (same/similar thing the one above does in one of the tests).

    http://bloggingabout.net/blogs/wellink/archive/2004/03/15/598.aspx

    A long thread on the good and bad of GUIDs (I found some of the links above here).

    http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=149&messageid=275248

    Also, it seems that the table designer is not aware of NewSequentialId() as a valid default value, and it displays some warnings when you create/modify a table using it. I don't know if SQL 2005 SP1 fixed this.

Viewing 15 posts - 1 through 15 (of 37 total)

You must be logged in to reply to this topic. Login to reply