Are There That Many GUIDs?

  • I've needed GUIDs exactly once, when I was creating an application that was intended for distributed authorship of items to be stored in (author local) databases.

    My solution for the ID problem was simple: Each object had a GUID for use in trading items between authors (each author had a local database). In the local database there was a table with an identity column as the primary key and the GUID with a (non-clustered) index.

    All associated data for the item was then linked using the identity column. The GUID was thereafter only used when exporting the data to another author.

    While (thankfully) I don't deal with multi-terabyte databases, or databases with 2+ billion records 🙂 I'm perfectly happy using Int32 identity fields for links. They perform far better than GUIDs for grunt-work linkages!

  • The only time I opt for a GUID over an identity column as a PK, are in data applications where data is transmitted to other systems for processing and then returned to the source. This was done more so to guarantee (I use that word lightly) uniqueness across systems.

    But in any self enclosed system, identity columns are just easier to use due to their size and indexing capabilities.

  • I have to agree that using a Globally Unique Identifier (GUID or UUID) data type for a normal sized database table row primary key is a waste of memory, space, and cpu cycles.

    It is just a mis-use of the data type and a good way to make your database environment slower and in need of an upgrade.

    One little known fact about the GUID generation algorithm is the user's network card MAC address is used as a base for the last group of GUID digits so that data can be tracked back to the computer that created it.

    If you need an identifier to relate or ID data in a globally networked environment so that it can be tracked back to the computer that created it, you should use a GUID. I am sure that is also why MS uses it in the new interconnected enterprise class CRM application.

    This is a data type that was originally created to uniquely identify and track software applications, objects, and classes in large interconnected systems so that there would not be system level issues with identifying and calling the correct device drivers, low level system objects, and software objects in the COM based world called windows networked computers.

    Here are the common uses for a GUID or UUID:

    * Database servers can use GUIDs to create unique row identifiers, solving the chicken and egg problem inherent with sequential row IDs in data that resides on more than one server.

    * Microsoft Windows uses GUIDs internally to identify the classes and interfaces of COM objects. A script can activate a specific class or object without having to know the name or location of the dynamic linked library that contains it.

    * Intel's GUID Partition Table, a system for partitioning hard drives, uses it to cross the 2.2TB barrier for partition sizes of 9.4 ZB.

    * ActiveX, a system for downloading and installing controls in a web browser, uses GUIDs to uniquely identify each control similar to COM objects.

    * Many systems use a GUID to identify and track the cookies issued to web browsers visiting web sites.

  • I use them frequently. As Grant noted, it's easy enough to use them in a way that minimizes the performance impact. In my experience most of the pain comes from not understanding page splits, fill factor, and sequential id's. That's not to say I use them every time, but there are plenty of cases where I think they work well. Strangely perhaps, I believe that performance isn't everything. It's important, but sometimes I think we try too hard, the equivalent of taking out the seats in our car to get .05 more miles per gallon.

  • What a weird coincidence; only a few nights ago, I had a dream that our precious reserve of globally unique identifiers was becomming depleted by poor database architecture design and clickstream data warehousing. The encoding of a GUID basically consists of a timestamp and the id of the network card, so it should be theoretically unique across space and time. If the computer has no network card, then a randomly generated number is substituted, so it's possible, but highly improbable, for two computers without a network id to generate the same GUID. It seems a lot more probable that two computers with the same network id, inserting millions of records per day, will inevitably generate duplicate GUIDs. Perhaps with hardware virtualization, it's entirely possible for two instances of SQL Server to be operating with the same network id in the same organization.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric Russell 13013 (10/12/2010)


    Perhaps with hardware virtualization, it's entirely possible for two instances of SQL Server to be operating with the same network id in the same organization.

    What's also a weird coincidence is that I was just wondering what my VMWare hosted SQL servers would produce in terms of GUIDs!

  • Andy Warren (10/12/2010)


    ...Strangely perhaps, I believe that performance isn't everything. It's important, but sometimes I think we try too hard, the equivalent of taking out the seats in our car to get .05 more miles per gallon.

    Blasphemy! 😀

    Joking aside, I wish I had clients who felt that way. They are downright boring about squeezing every ounce of performance out of their databases and hardware. Sometimes, that insignificant little change today that saves 20 ms can save billions of milliseconds downstream as the data volume ramps up.

    I do agree, however, that sometimes we may try a little too hard. That's why I've developed certain practices over the years and one of them is to avoid GUID's like the plague. They don't add enough (any?) value to be worth the hassle.

    "Beliefs" get in the way of learning.

  • Andy Warren (10/12/2010)


    Strangely perhaps, I believe that performance isn't everything. It's important, but sometimes I think we try too hard, the equivalent of taking out the seats in our car to get .05 more miles per gallon.

    I actually see this using a GUID for a table row primary key like driving by yourself in a fully loaded semi truck tractor trailer instead of a prius.

    Simple math say that a GUID uses 100% more everything than a BigInt just becuase of the size of the data type.

    That is hardly a .05 differance.

    GUID may not the largest most memory and cpu intensive datatype that you can use for table Primary Key, but it has to be in the top five.

    I worked on upgrading a JDE system upgrade that was over ten years old and had tracked the parts ordering and use for over 5 million computers sytems and accessories.

    It had not even begun to use the amount of values possible with a BigInt Primary Key value with Identity enabled.

    It had, however maxed out thier SQL servers memory, cpu, and disk space.

    The DB was so large they had a system constantly running to move the backups to tape.

    I wonder how much more GAS they would have purchased if all the tables used a GUID instead of a BigInt for a primary key?

  • André Lozeau (10/12/2010)


    fyi, Microsoft CRM (MS Dynamics) uses GUIDs as clustered primary keys for all tables, at least the way it has been implemted here

    I seem to recall that SharePoint also uses GUID all through its database as well.

    Beer's Law: Absolutum obsoletum
    "if it works it's out-of-date"

  • One overlooked consideration here, but touched on by Steve, is whether or not user's are allowed to view and use the primary key to search and/or locate records. If developers typically provide a primary key in gridviews or other controls that the user's employ down the line, then an int32 or int64 identity field is mandated. On the other hand if, primary keys are only used in background, and hidden from the user, then guid's are a perfectly viable PK candidate (as long as it is not a clustered PK).

    Ron K.

    "Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler

  • The whole idea of encoding your server's network id and date/time stamp inside the primary key is practically a form of denormalization. For example, if you have a point of sale application, then at least the register id and date/time of the transaction are already inserted in the header record, and there is no reason why those two columns can't be a unique key for transactions within the retail company. Even when the company merges with another retailer, there is no reason why two legacy transactions should have the same store id, register id, and date/time stamp. If you get key violations in your datawarehouse, then that would evidence that there is a bug in your ETL process. Primary keys based on a GUID may actually mask duplicate data.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • <SmartA$$ ON>I LOVE it when clients use GUIDs in their database schemas - more tuning work for me!! 😀 <SmartA$$ OFF>

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • In the past I reflected on the advantage of INT (IDENTITY) vs. GUID, typically deferring to the INT (IDENTITY) in my database work due to size and speed.

    Now, I’m a full believer in creating a deterministic key (via a HASH) compared to the non-deterministic key via NEWID() or IDENTITY. I’ve been running tests on it for several months, and its working quite nicely.

    I’ve created a function to produce a key value (INT, BIGINT, or composite key combinations) for the key based on my requirement (data size vs odds of collision). I’m still “testing” it, but from my general observations its working great.

    Beer's Law: Absolutum obsoletum
    "if it works it's out-of-date"

  • TheSQLGuru (10/12/2010)


    <SmartA$$ ON>I LOVE it when clients use GUIDs in their database schemas - more tuning work for me!! 😀 <SmartA$$ OFF>

    😉 Yeah, it's really a globaly unique revenue genrating tool. Let's hope that they never learn how to add one to IDENT_CURRENT or use DBCC CHECKIDENT :w00t:

  • Robert Frasca wrote:

    It just goes to show you that even Microsoft sometimes allows the developers to dictate data architecture rather than the data guys. (It's been my experience that object oriented developers LOVE guids for some reason.)

    Is it a known fact that the developers at Microsoft 'dictated' the data architecture? Or are we just taking a cheap shot at developers in general, and at the object-oriented kind in particular?

Viewing 15 posts - 16 through 30 (of 169 total)

You must be logged in to reply to this topic. Login to reply