GUIDs GUIDs everywhere, but how is my data unique?

  • Comments posted to this topic are about the item GUIDs GUIDs everywhere, but how is my data unique?

    Tom Romeo
    Practical Compliance Solutions Corp.
    One Orient Way - Suite F156.
    Rutherford, NJ 07070-2524
    --------------------------------
    www.complianceabc.com
    email: tromeo@complianceabc.com
    Phone: 201-728-8809

  • My motto - DO NOT use GUIDs as a primary key and do not use them at all (unless they required for replication/synchronisation scenarios between 2 OCCASIONALLY CONNECTED databases). Did you ever try to compare 2 GUIDs during debugging?

    If you need GUIDs, use sequential one (they are more index-friendly than random ones)

    BTW NEWSEQUENTIALID functionality had been introduced in SQL Server 2005 😛

  • I never respond to these articles, but I am tired of "analysis with clear bias" BS.

    A lot of what you say is just not true and the rest is clearly biased against Guids.

    I have no money invested in Guids, but they are the only primary key to be guaranteed never to be a problem in any situation.

    The most legitimate argument, clustered index issue, can be easily overcome with a traditional auto increment column that is used for clustered index. Yes, waste of space, but makes inserts (practically) independent of the Guid.

    Some tables need Guids for keys, some don't, but it is much easier to work with a DB with consistent design patterns.

    The traditional primary key concepts are from a time when storage cost 10,000 more than today. It is time to leave the horse and buggy behind.....

  • NEWSEQUENTIALID() is just a poor band-aid around a problem that needs to be addressed in a different way.

    Depends a bit (a lot, I guess) on your environment, but if you're coding with MS EDM at the application side, you benefit a great deal if you don't have to go to the database for each new ID.

  • I agree - an ART for sake of ART because Web designers would like to generate identity in JavaScript (!)

    Plus Microsoft table designer in SSMS does not like a default value for new sequential GUIDs - the bug is not fixed since 2005.

  • I have heard the same lazy stereotype applied to dbas. It is not laziness that I use GUIDs all the time. It is a consistent design, they are guaranteed unique, they do not have data significance which is a positive, not a negative. Debugging with GUIDs is not the burden that some claim it is. Is it as quick to look at a GUID compared to an integer, of course not, but if you aren't looking at the data anyway, what the heck are you debugging? Using something other than GUIDs does not guarantee that you won't have multiples on a merge, it only means you won't have multiple primary keys. finally, earlier comments have noted that fragmentation can be easily addressed if necessary.

  • The fragmentation problem was worked around years ago with the invention of Combs, long before NewSequentialId existed. It allows sequential Guids to be created on multiple clients without hitting the database. Very handy for saving complex object graphs. I have had a piece of C# to do this for at least 6 years...I believe NHibernate has a similar feature though I haven't used it.

    This article is really just an argument against surrogate keys. Having used both natural key designs and surrogate key designs, I gotta tell you, surrogates every time 😀

  • Guys what is wrong with Auto-incremental identity that exists for ages! Or with Sequence originally existed in Oracle and now ported to SQL Server. Why the itchy need to reinvent "The Bicycle"?

    Do you realize what price you are paying in terms of performance when GUID is made a clustered primary key? The fact that it works with few thousand records doe not make it industrial-strength solution. Progress in storage hardware does not mean that you should ignore BASIC FUNDAMENTALS.

    With implementation like this, there is no wonder that we see a rapid rise of NoSQL movement!

  • The downsides of auto increment columns are well discussed and documented.

    I have tables with > 5,000,000 records with Guids as primary keys. These tables contain static data so clustered index is not an issue.

    Even if they were written to frequently there are a few remedies to the clustered index issue even in this thread.

    => Guid as a primary key is not a problem with millions of records, you just have to know how to design and use it the right way.

  • Great article. This particular issue is a huge problem and was in place for many years in my current environment which we now have to go back and fix (clustered GUID indexes).

    The idea disk space is in anyway less expensive is true for home users or very small businesses, but at the enterprise level, especially for SQL Server, it is still the largest expense in place for IT; easily in the multi 7 figures even for a medium size business running a small number of servers (less than a 100). Generic space is less expensive, IOPs per GB are damned expensive. 🙂

  • jlahdenpera (12/18/2014)


    The downsides of auto increment columns are well discussed and documented.

    I have tables with > 5,000,000 records with Guids as primary keys. These tables contain static data so clustered index is not an issue.

    Even if they were written to frequently there are a few remedies to the clustered index issue even in this thread.

    => Guid as a primary key is not a problem with millions of records, you just have to know how to design and use it the right way.

    Guids as primary keys are not a problem, Guids as clustered indexes are a major problem. If your primary key is a guid and your cluster index is something sequential, its a none issue.

  • bigint datatype occupies 8 bytes of storage

    uniqueidentifier - twice more 16 bytes,

    Therefore for each 1 million of records you got an overhead of extra 0.763 Mb for additional storage.

    Seems to be a small value but - how many tables do you have? And in cloud-based scenario you are paying for storage PER MONTH. Good luck!

  • jlahdenpera (12/18/2014)


    The downsides of auto increment columns are well discussed and documented.

    I have tables with > 5,000,000 records with Guids as primary keys. These tables contain static data so clustered index is not an issue.

    Even if they were written to frequently there are a few remedies to the clustered index issue even in this thread.

    => Guid as a primary key is not a problem with millions of records, you just have to know how to design and use it the right way.

    That's a lot of static data! But if it works for you on those tables, that's fine. However, you're looking at 12 extra bytes per row compared to using an int identity. That's 60MB of extra disk space (perhaps more, perhaps less) in the table itself, not to mention the extra space in any referencing tables. That may not be much in dollar terms, but it's about 7000 extra data pages, which means more IO to satisfy a SELECT query. It also uses more memory to cache once it's been read - hence more data being flushed from cache and therefore more IO on other tables.

    As for consistent design patterns, I disagree. I don't doubt that GUIDs are appropriate in some situations. But just because you have a hammer, it doesn't mean everything is a nail. Use the right tool for the right job.

    John

  • fregatepallada (12/18/2014)


    bigint datatype occupies 8 bytes of storage

    uniqueidentifier - twice more 16 bytes,

    Therefore for each 1 million of records you got an overhead of extra 0.763 Mb for additional storage.

    Seems to be a small value but - how many tables do you have? And in cloud-based scenario you are paying for storage PER MONTH. Good luck!

    Actually I am currently in the process of getting the numbers for moving to either AWS or maybe Azure (if they can really support the size and load).

    From a pure physical onsite architecture, I will give an example of a single database in our system which we would multiply times the number of these type of databases in ur system.

    Lets say we have 1TB of actual data in the data file, roughly 60GB of transactions in the tlogs at any one time, 80GB of locked down tempdb files, and 20 odd GBs of system stuff.

    The data files pull 6000 iops at 80%read, 20% write

    The log files pull near 1000 iops at 99%write/1% read (we have 1.5TB of ram on the servers)

    The tempdb pulls about 3000 iops at 50%read/50% write

    Now we do some magic with the data files using various forms of SAN cache and SSD, but we are looking overall we are looking at a 10,000 iop system for 1TB of data.

    Now lets say we have 10 of these (which we do, and more) and you want to buy storage. This is a made up figure, but not far off the actual figure. So lets say to achieve nearly 100,000 iops for approximately 10TB on a SAN system, you will spend about 1.5 million dollars or approximately $150,000 per TB for a mostly spinning disk system. This is capital. For the operating figure for cloud numbers, you just poke these into whatever calculator your potential cloud provider has and see what you come up with.

  • Running in MS Azure.... not expensive at all.

    Check out Azure pricing. Ridiculously cheap compared to many in-house IT depts.

    250GB database costs anything between $15 and $75 per month.

    500GB database is between $465 and $3,720 per month depending on I/O

    Whether that is a lot of money depends on who you are, but even a small business can easily run a several hundred GB DB in Azure.

    (I do not get paid to advertise MS or Azure :))

    30 years ago efficient storage use and coding was critical (yes, I was doing this back then), but less and less critical nowadays.

Viewing 15 posts - 1 through 15 (of 37 total)

You must be logged in to reply to this topic. Login to reply