A Look at GUIDs

  • Mike, as mentioned in other posts (and w/o trying to convert you, just trying to help with your legacy headache), one common mistake when using GUIDs is to cluster using only the GUID PK (ie, the cluster and PK are the same index). An easy solution to avoid many of the problems you mention is to just add a creation datetime field before the GUID field, and cluster on it (while the PK is still the GUID by itself). Mosttimes, this gives you the best of both worlds. The main drawbacks are that your need a creation datetime field (which is typically a non-issue when you're compliance minded) and that your cluster index key will require more space, and all indexes will be slighlty larger (which is not that much when you consider that you're already using a GUID). However, IMO the benefits are well worth it...

  • There is one additional complication on the database side. While the uniqueidentifiers are guaranteed unique, they are not guaranteed to be sequential. That means as you generate new uniqueidentifiers as primary keys and insert them into a table there is a much higher chance you'll cause a page split. SQL stores data on 8k pages, a page split occurs when there is not enough room on the page for the new row, so the page is basically split into two pages so that the insert can success. Page splits in general are not bad, it's the frequency of page splits that can have an impact on performance as each split requires additional locking and disk IO. In my view the worry about page splits is overblown, but it's something you'll have to access in your environment to be sure. Fast drives do much to alleviate this potential issue.


    Although I think your article was good and I'd love to see more people taking advantage of GUIDs in their database designs, I have one issue with your article.  The above excerpt.  I'd love to know your source for this information and I'd love for you to further explain how page splitting will be increased due to the nature of GUIDs.  I've never found any Microsoft documentation warning of such a scenario and what you claim goes against my understanding of how b-trees work.

    My concern is that your statement is the opposite of the reality.  Assuming that we can agree that SQL Server's indexes are all b-trees, based on my understanding of b-trees, if you insert a preordered set of records into a b-tree more page splits will occur than if you had inserted those same records in a random order.  Additionally, page splits isn’t what's kills performance rather rebalancing of the b-tree is what is so detrimental to performance.  Inserting rows into a b-tree in order causes the tree to become unbalance far more frequently than if rows were inserted into the tree in a random order.

    In any case I'd love to know where you got that information or have you explain it more thoroughly.

  • Dear friends, the whole discussion was interesting to learn, but for a small developer like me, its cumbersome to use guid as a single query run to look for a particular id in the table having 3 lacks record takes more considerable amount of time than using integer.

    Also when I use the lookup in MS BIDS and the column is guid, i suspect it takes too much time.

    Thanks,

    Jwalant Natavarlal Soneji

  • Since GUID has turned into chicken-egg arguments, wonder if anyone can really advise me on this issue:

    Here's my scenario, a purchase order (PO) application:

    We want to have a centralized database with remote sites connected to it.

    Some of the sites are without connection, they will have their own servers with scheduled replication to the centralized database.

    The schema design is something like this:

    Each PO will have many revisions

    Each revision will have many PO line items

    Each PO line item will have many Delivery Schedules

    In the past i used int IDENTITY as transaction ID in revision and line item tables.

    transaction ID in revision table is FK to line item table, and transaction ID line item table is FK to Delivery Schedules table.

    This work well in standalone database.

    Now that we need to merge replicates, int IDENTITY produced in remote DB will conflict with IDENTITY produced in central DB.

    I'm thinking of using GUID to replace int IDENTITY.

    Question:

    What kind of mess i'll be seeing in the future?

    Can't GUID size indexing problem be solved with partitioning?

    Can you suggest other alternatives to GUID, based on the above scenario?

    Thanks in advance

  • I think GUID's are made for that kind of scenario. All the same pros and cons apply! As long as the hardware isn't fully utilized already I suspect it will absorb the overhead of the CPU/space without issue. Partitioning works best when you really have a good partitioning criteria, not something I'd suspect would match a guid. More likely is an order date or location id.

  • Thanks andy, it's a relief to know that using GUID is not really a recipe for disaster.

    Most probably i will partition by projectID.

  • Kay Mohd (12/12/2007)


    Question:

    What kind of mess i'll be seeing in the future?

    Can't GUID size indexing problem be solved with partitioning?

    Can you suggest other alternatives to GUID, based on the above scenario?

    Thanks in advance

    - I'm not sure what you mean by the first question. You might be asking if using GUIDs will make your data more difficult to deal with and the answer is, a little. It's a little easier for a human to look up record number 82342 in two or more different tables than it is to look up record 7823F847-EA50-4D3B-BBE2-6DD2B093C8B8 in two or more different tables. So when you are researching a problem INTEGER ID's make things slightly easier. You might ask yourself how hard is it to join two or more tables though.

    - In your situation a GUIDs size poses less of a problem because some of the other options require mutiple keys which causes the alternative indexes to begin to approach the size of the GUID if not exceed it.

    - The most obvious alternative is to use a compound key in your situation. Include two columns in your tables, one a server identifier and the other a record identifier. This compound key can be used to uniquely identify each record regardless of which database it's replicated to. Another alternative is to partition your IDENTITY values into ranges and assign each server a range, the problem with that approach is the ranges are finite and eventually you might have to do some maintenance to avoid a key collision. I really despise the partition/range approach as there are some many pitfalls with it. One location might do more business than another yet you might not know that initially and assign the same size range to each location which will cause you to have to do far more maintenance to keep the high volume location up and running than is required to keep the low volume location up and running.

    By the way, I would never recommned the IDENTITY partition/range approach in you situation. I've seen it implemented a number of times and it was always ugly and very costly.

  • Posted an article about GUID's in my blog - see it here

Viewing 8 posts - 31 through 37 (of 37 total)

You must be logged in to reply to this topic. Login to reply