Are There That Many GUIDs?

  • GSquared (10/29/2010)


    Jeff Moden (10/29/2010)


    Paul White NZ (10/27/2010)


    Jeff Moden (10/27/2010)


    Steve Jones - SSC Editor (10/27/2010)


    Some of the smaller manufacturers aren't careful. In the 90s we had two separate NICs have duplicate MACs when we bought an off brand.

    I know I'm simple to please but you have no idea how happy that paragraph just made me. 🙂

    Why? The network card address is just one component of a version 1 GUID - what conclusions are you drawing here?

    I won a $100 bet on it. 😉

    Let's see if I can get this right:

    BWWAAAAAAAAHAAAAAAAAA!!!!!

    Heh... Z'actly what I was thinking when Steve made his post. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • For what its worth...

    We have a redevelopment going on at the moment which is using GUIDs for all the clustered keys. Some of the data is generated through a distributed computing network, and some of it is generated centrally.

    Having a GUID as the Identity (and clustered key) for tables like Rep, Product and Customer is just frustrating. There's no reason to have a GUID as there's no possibility of running out of INT let alone BIGINT.

    For other data (e.g. Order line) which may be generated on any device, this makes more sense. Not having collision detection on it is a rather more worrying case. I'll let you know when we detect the first collision!

    One area which no-one has mentioned (that I saw) was that using GUIDs as the clustered key can avoid hot-spots when adding a lot of data. The random nature of a GUID will mean that the data is being added to most of the blocks in that table rather than all at the end. Obviously we are talking an edge case, but one not that far from most people. Slow down I/Os by 10% due to resource contention and watch how quickly that I/O queue grows.

    Edit - Now I feel a fool - posting on a 5 year old topic! I was wondering why no-one had mentioned Sequences

  • For a short time a long time ago I had used GUIDs as keys but due to the flaws/cons highlighted in the numerous pages of posts I have moved away from it, however, I reserve the right to use them when I consider them an appropriate solution.

    I just wish I had known then what I know now.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • I didnt use to use GUID's /UUID's as primary key and I still do not. However, I sometimes use it as an alternative key or temporary id.

    Imagine you have hundreds or thousands of mobile devices. They are sometimes offline but should still be able to work. Guid's here offers the only reliable solution.

  • IceDread (3/24/2015)


    I didnt use to use GUID's /UUID's as primary key and I still do not. However, I sometimes use it as an alternative key or temporary id.

    Imagine you have hundreds or thousands of mobile devices. They are sometimes offline but should still be able to work. Guid's here offers the only reliable solution.

    They are often a good solution for a sometimes disconnected distributed solution, but the only one? Them's big words...

  • call.copse (3/24/2015)


    IceDread (3/24/2015)


    I didnt use to use GUID's /UUID's as primary key and I still do not. However, I sometimes use it as an alternative key or temporary id.

    Imagine you have hundreds or thousands of mobile devices. They are sometimes offline but should still be able to work. Guid's here offers the only reliable solution.

    They are often a good solution for a sometimes disconnected distributed solution, but the only one? Them's big words...

    Fine, not the only solutions. Keys based on multiple columns is an option for instance but then complexity increases and there are still situations where problems here can arise. A Guid solution is here much more manageable.

  • Toby Harman (3/23/2015)


    ...

    One area which no-one has mentioned (that I saw) was that using GUIDs as the clustered key can avoid hot-spots when adding a lot of data. The random nature of a GUID will mean that the data is being added to most of the blocks in that table rather than all at the end. Obviously we are talking an edge case, but one not that far from most people. Slow down I/Os by 10% due to resource contention and watch how quickly that I/O queue grows.

    Edit - Now I feel a fool - posting on a 5 year old topic! I was wondering why no-one had mentioned Sequences

    Randomly inserting records in high volume effectively turns the entire table into a hot spot. Read up on page splitting. What we want is to append rows sequentially to the end of the table. Also, we don't want something like an e-commerce website or telemetry device performing millions of individual inserts daily into a table.

    Whenever I need to insert a large number of rows into a table, what I do is to bulk load it, even if it's in an OLTP environment. 10,000 rows can be bulk inserted in less time than 10 rows inserted in seperate batches. If the records are arriving in a constant stream, then first stage them to something like a tab delimited text file, and then schedule a process to periodically load every hour or 5 minutes. I also use integer based identity columns when surrogate keys are required.

    That's totally the way to go. Never look back.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I've never used a GUID this way.

  • In my more than 3 decades of database design, development, and administration I have never used GUIDs in any of my systems. I did have this debate with a vendor who wanted to use them for one of our international implementations.

    It seems that some people believe a GUID is a magical set of characters that can never be duplicated. My position is that any set of 32 hexadecimal characters can not possibly be guaranteed unique because the possibilities are finite.

    {00000000-0000-0000-0000-000000000000} to {FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF}

    As others have mentioned already, there are large global companies who have discovered the hard way that this is a finite set with no guarantee of uniqueness.

  • Toby Harman (3/23/2015)


    For what its worth...

    We have a redevelopment going on at the moment which is using GUIDs for all the clustered keys. Some of the data is generated through a distributed computing network, and some of it is generated centrally.

    Having a GUID as the Identity (and clustered key) for tables like Rep, Product and Customer is just frustrating. There's no reason to have a GUID as there's no possibility of running out of INT let alone BIGINT.

    For other data (e.g. Order line) which may be generated on any device, this makes more sense. Not having collision detection on it is a rather more worrying case. I'll let you know when we detect the first collision!

    One area which no-one has mentioned (that I saw) was that using GUIDs as the clustered key can avoid hot-spots when adding a lot of data. The random nature of a GUID will mean that the data is being added to most of the blocks in that table rather than all at the end. Obviously we are talking an edge case, but one not that far from most people. Slow down I/Os by 10% due to resource contention and watch how quickly that I/O queue grows.

    Edit - Now I feel a fool - posting on a 5 year old topic! I was wondering why no-one had mentioned Sequences

    This editorial was reposted today, so responses are valid. Don't feel like a fool for that.

    In your reply, the implication (emphasized by me above) is that a GUID has a larger capacity than a bigint. The actual number of unique values of a GUID is 2^122; for a bigint data type it is 2^126 (using the full range from the most negative value to the most positive value).

    I have a demo of using a GUID as the clustered index - the index is almost instantly fragmented with just 10 rows (> 80% fragmentation). At 100 rows it's > 95% fragmentation. At 100k rows, it's "perfectly fragmented" with > 99.5% fragmentation.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Steve Jones - SSC Editor (10/12/2010)


    One Orange Chip is the color of the 25k chip at the Bellagio in Las Vegas, or at least it was when I was there.

    Working the tables with a 25K chip! Nice!

    The more you are prepared, the less you need it.

  • "There have been some reported cases of duplicates..."

    Where?

  • Great topic, and discussion.

    On a related note, wish Microsoft didn't require using GUIDs in In-Memory OLTP (Heckaton) tables...

    Hakim Ali
    www.sqlzen.com

  • In your reply, the implication (emphasized by me above) is that a GUID has a larger capacity than a bigint. The actual number of unique values of a GUID is 2^122; for a bigint data type it is 2^126 (using the full range from the most negative value to the most positive value).

    Actually, a bigint is a signed 8 byte or 64-bit value so the range is only -2^63 to 2^63 - 1 and, not 2^126, whereas a GUID is a 16 byte or 128 bit value.

    The nice thing about a GUID is that internally it can be stored as 4 32-bit words or 2 64-bit words, so actual comparisons of GUID values can be done extremely fast by a processor. A CPU can compare an array of 4 32-bit words of 2 64-bit words far faster than you could ever compare two separate int or bigint columns in your database table. If we ever get 128 bit processors, that will become a single CPU operation to compare values.

    The only time a GUID becomes slow is when it has to be converted from human readable format into the 16 byte internal format during parsing of your SQL query.

  • GUID versus Integer, in terms of how efficiently the CPU can process them, is practically irrelevent in the databse realm. The vast majority of database are page crunchers, not number crunchers, so it's total number of disk pages read/written, memory usage, and b-tree traversal that matters. For that reason, a sequential integer will generally prove more efficient than a GUID.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 15 posts - 136 through 150 (of 169 total)

You must be logged in to reply to this topic. Login to reply