Are There That Many GUIDs?

  • I originally didn't like GUIDs because I thought they used too much "space". They really don't. Disk space is cheap, and GUIDs are only 16 bytes long.

    The nice thing about them is, if you think you have a customer ID, but due to a subtle bug somewhere, what you really have is an order ID, it is much easier to find that error -- you'll realize that there are NO customers with this "ID" since it's an order ID and not a customer ID. The non-overlapping nature of GUIDs is great.

    We use sequential GUIDs, and we DO create clustered indexes on them. Customers that are added later than other customers, and orders that are created later than other orders, etc., will naturally come later in the GUID ordering sequence. So, the indexes don't get split due to using GUIDs, if you use sequential ones.

    And when you're joining tables with these indexes, they work just as well as numbers from 1 to n.

    You don't need to remember them. You can easily copy and paste them, write them into logs, etc. If you want to assign a customer number to your customers, you can do that also, while still using a GUID as the primary key on your tables.

    As for the "are there that many" question, it's interesting to use an identifier that is essentially infinite. There are enough GUIDs to enumerate every atom in the known universe (at least THIS universe).

  • David Walker-278941 (10/12/2010)


    I originally didn't like GUIDs because I thought they used too much "space". They really don't. Disk space is cheap, and GUIDs are only 16 bytes long.

    The nice thing about them is, if you think you have a customer ID, but due to a subtle bug somewhere, what you really have is an order ID, it is much easier to find that error -- you'll realize that there are NO customers with this "ID" since it's an order ID and not a customer ID. The non-overlapping nature of GUIDs is great.

    We use sequential GUIDs, and we DO create clustered indexes on them. Customers that are added later than other customers, and orders that are created later than other orders, etc., will naturally come later in the GUID ordering sequence. So, the indexes don't get split due to using GUIDs, if you use sequential ones.

    And when you're joining tables with these indexes, they work just as well as numbers from 1 to n.

    You don't need to remember them. You can easily copy and paste them, write them into logs, etc. If you want to assign a customer number to your customers, you can do that also, while still using a GUID as the primary key on your tables.

    As for the "are there that many" question, it's interesting to use an identifier that is essentially infinite. There are enough GUIDs to enumerate every atom in the known universe (at least THIS universe).

    You are incorrect on many levels here:

    1) DISK SPACE isn't what is important, it is THROUGHPUT, and that is dramatically affected by the quadruple-int-sized GUIDs. MEMORY is a precious commodity (on most systems) for the buffer pool - also hobbled by GUIDs.

    2) GUIDs do NOT work just as well in joining tables. They are 4X the size of ints, and yes, this DOES matter.

    3) for 9x% of data spaces out there, integer is "essentially infinite" - even for those that just use 1,1 identity and thus give up half the range of values. Big INT (still HALF the size of a GUID) goes into the bajillions if you need it.

    4) Your use of some extremely esoteric programming bug as a validation for GUIDs is just that - extremely esoteric.

    5) FRAGMENTATION (not just clustered here - ANY index) leads to NON-SEQUENTIAL IO, which destroys disk throughput and IOPs. Even sequential GUIDs fragment unless you ALWAYS use a SINGLE machine to generate them. Most do not in my experience.

    6) IF you cluster on them, then 4X bytes stored on every row in every NC index.

    7) Larger size also leads to a) larger/slower backups and restores, b) more work to update stats c) more work to do index maintenance d) more network bandwidth, etc

    Sorry, but I will stick by my guns on this one - I LOVE it when clients use GUIDs because it DOES create more work for tuners. Hardware vendors gotta love it too.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • OK, well, I see that we disagree. People who are smarter than I am (who write books on the guts of SQL, for example) also like GUIDs. I'm not going to get into a fight, so I'm done.

  • Chiming in with my $0.02 here ...

    I have had experience with developers wanting to use GUIDs as PKs, and my argument against is very simple - the intent of a PK is not to "uniqueify" a row, it's to provide a unique identifier for related data: i.e. Parent --> Child. SQL Server does this best with numbers, not GUIDs. In our testing (SQL2005, roughly 2 years ago), attempts to use GUIDs as join criteria from parent to child tables was a tiny increase in resources for single row retrieval, but for ranges of data the resources consumed - and therefore performance - grew by a significant amount; in our tests the GUID performance consumed roughly 1600% more resources than the corresponding tests using bigints.

    The better way we came up with is to create an int (or bigint) surrogate key for the PK and add a GUID as a UK (indexed), and join to child tables using the numeric PK. This will allow the application to search for the specific record it needs, the record to become globally unique, and the database to relate any child records internally on the numeric key.

  • GUIDs -

    It is amazing the level of misunderstanding around using GUIDs in a database. Especially for keys.

    As a consultant I find myself constantly coming in behind folks and fixing designs that try to only work with identity, int, or char keys when a GUID would fix the problem of providing a "unique" key value.

    I used to engage in long discussions with people trying to convince them to use GUIDs in rows to provide uniqueness. But then I realized that my efforts were being wasted and costing me MCD.

    So if you don't want to use GUID because they are hard to type or remember .... thanks for MCD!

    Cheers,

    Dave Winters

    Def: MCD = More Consulting Dollars

  • TheSQLGuru (10/12/2010)


    <SmartA$$ ON>I LOVE it when clients use GUIDs in their database schemas - more tuning work for me!! 😀 <SmartA$$ OFF>

    Oh, come on! That's like a personal trainer saying:

    "I love it when McDonalds puts double cheezeburgers on their dollar menu."

    🙂

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • David Walker-278941 (10/12/2010)


    I originally didn't like GUIDs because I thought they used too much "space". They really don't. Disk space is cheap, and GUIDs are only 16 bytes long.

    As for the "are there that many" question, it's interesting to use an identifier that is essentially infinite. There are enough GUIDs to enumerate every atom in the known universe (at least THIS universe).

    David,

    Spoken like a true developer. My hat goes off to you for being so brave and bold in the conviction of your statements.

    Can I get the contact information for business owners or companies of your last ten projects that you used GUID's for table column row id's? I have a daughter in college learning knowledge and I need the extra cash.

    David Walker-278941 (10/12/2010)


    There are enough GUIDs to enumerate every atom in the known universe (at least THIS universe).

    I am afraid that most of the physicists in this known universe might disagree with your statment. If you would have kept it to just planets and stars maybe you would be closer to facts.

  • dwinters 67194 (10/12/2010)


    As a consultant I find myself constantly coming in behind folks and fixing designs that try to only work with identity, int, or char keys when a GUID would fix the problem of providing a "unique" key value.

    You ever "consult" on a JDE implimentation? If so please forward me the organizations contact information so I can get my MCD also. 😀

  • David Walker-278941 (10/12/2010)


    OK, well, I see that we disagree. People who are smarter than I am (who write books on the guts of SQL, for example) also like GUIDs. I'm not going to get into a fight, so I'm done.

    Please provide us with this reference material so we can also read and understand your point.

    All I have to go by is the SQL Online books information that I have cut and pasted bellow:


    If an application must generate an identifier column that is unique across the database, or every database on every networked computer in the world, use the uniqueidentifier data type and the NEWID or NEWSEQUENTIALID() function.


  • Other than for merge and updatable transactional replication, I've yet to hear an argument as to why an identifier has to be globallyunique. Sure, it sounds grand but could someone explain to me how ProductID '6F9619FF-8B86-D011-B42D-00C04FC964FF' is somehow "more" unique or better than ProductID 27? It's not as if my products are part of some global product master table that every business shares. I work in the healthcare industry right now and the ubiquitous "they" are in the process of assigning every single physician in the US with a national provider identifier aka NPI. (Interestingly, this is going to replace the Uniform Provider Identification Number (UPIN) currently used by Medicare.) You'd think that would be a pretty good business reason for assigning a "globally unique ID" yet, it's not a guid. It's an integer. My social security number is an integer. In fact, my SSN is so unique my identity can be stolen!

    Can someone provide a legitimate business reason for using a GUID? (Excluding replication scenarios)

    "Beliefs" get in the way of learning.

  • Robert Frasca wrote:

    Can someone provide a legitimate business reason for using a GUID?

    UUIDs (universally unique identifiers) are valuable in distributed computing environments. Any one computing device may create a truly unique identifier. Other devices may then reference the identifier with assurance it is unique.

    The need, and thus the related cost, for centralised provisioning of that identifer are accordingly obviated.

    The GUID is simply Microsoft's implementation of the OSF UUID standard. Note that UUIDs may be generated via any of large number of software implementations, MS SQLServer being but one of them. The widespread implementation of the UUID standard - on Linux, Apple, IBM, Microsoft, and other operating systems, in a large number of programming language libraries - attests to the broad use of this standard.

  • Craig-315134 (10/12/2010)


    wrote:

    The need, and thus the related cost, for centralised provisioning of that identifer are accordingly obviated.

    This explains why a person should use a UUID in application development.

    That is not an answet to the question.

    The question is:

    What is the valid business reason for using unniqueidentifer SQL data type as the ROWGUID and Primary Key for a table in a MS SQL server database?

  • SanDroid wrote:

    This explains why a person should use a UUID in application development ... not an answer to the question.

    You are mistaken. The question asked was 'what is the business purpose for using GUIDs?' I believe I answered the question. The question you are asking is an entirely different one.

  • Craig-315134 (10/12/2010)


    SanDroid wrote:

    You are mistaken. The question asked was 'what is the business purpose for using GUIDs?'

    How could I have been so stupid to think we were talking about SQL on this web site. My apologies.

    BTW: you forgot "(Excluding Replication Scenarios)" in your quote.

  • Craig-315134 (10/12/2010)


    Robert Frasca wrote:

    Can someone provide a legitimate business reason for using a GUID?

    UUIDs (universally unique identifiers) are valuable in distributed computing environments. Any one computing device may create a truly unique identifier. Other devices may then reference the identifier with assurance it is unique.

    The need, and thus the related cost, for centralised provisioning of that identifer are accordingly obviated.

    The GUID is simply Microsoft's implementation of the OSF UUID standard. Note that UUIDs may be generated via any of large number of software implementations, MS SQLServer being but one of them. The widespread implementation of the UUID standard - on Linux, Apple, IBM, Microsoft, and other operating systems, in a large number of programming language libraries - attests to the broad use of this standard.

    You answered the question but your answer was essentially merge or updatable transactional replication. Additionally, truly distributed computing environments are pretty rare, at least in the context of ithe world wide installed base of SQL Server instances.

    I guess I should have been more succinct. I never meant to imply that there is no use for GUID's. Obviously, distributed computing is a legitimate use case. What I fail to see are other business applications where a guid is neccessary for data integrity. The fact that there is a datatype called uniqueidentifier doesn't make it appropriate or necessary to use it. Microsoft, in BOL, appears to actively discourage it's use for run-of-the-mill data implementations but it remains a popular choice despite it's inherent shortcomings.

    "Beliefs" get in the way of learning.

Viewing 15 posts - 31 through 45 (of 169 total)

You must be logged in to reply to this topic. Login to reply