SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

A Look at GUIDs

By Andy Warren,

Uniqueidentifiers are a useful tool, but not one that is universally loved. I'm going to start by covering some background, then talk about how they can be used in both SQL 2000 & 2005, and finish with some thoughts about why uniqueidentifiers are of special interest to developers doing data access.

Let's start with some terminology. What we SQL people call uniqueidentifiers most people call Globally Unique Identifiers (GUID's). It's a sixteen byte number generated using a combination of different data elements (commonly the system time and a portion of the network card MAC address, but there are variations) in a way that is supposed to guarantee it will be unique. You can read a slightly more detailed overview on Wikipedia or get into some serious detail with this never quite finished Internet Draft (which notes that the generation algorithm supports key generation rates as high as ten million per second!).

Most you are familiar with identity (auto number) columns. If you're in a situation where you need to easily generate a unique key they are very handy, but can be troublesome in a couple situations. One is where as a developer you find it useful to be able to generate a new key on the client without having to make a round trip to the server. Another is when using replication, inserting rows into an updatable subscriber require some extra work to manage identity key ranges. Uniqueidentifiers easily address both of those issues, with the corresponding tradeoff that they are harder to read (they look like this: 5da3c440-0a40-11db-9cd8-0800200c9a66) and they require sixteen bytes of storage, where identities are usually stored as four byte integers.

In SQL 2000 & 2005, it's easy to generate a uniqueidentifier, just call the NewID() function. For example, this would return a new id:

select newid()

Nothing complicated about that. It's also possible to use it as a default on a column so that any row inserted will have a key generated automatically.

There is one additional complication on the database side. While the uniqueidentifiers are guaranteed unique, they are not guaranteed to be sequential. That means as you generate new uniqueidentifiers as primary keys and insert them into a table there is a much higher chance you'll cause a page split. SQL stores data on 8k pages, a page split occurs when there is not enough room on the page for the new row, so the page is basically split into two pages so that the insert can success. Page splits in general are not bad, it's the frequency of page splits that can have an impact on performance as each split requires additional locking and disk IO. In my view the worry about page splits is overblown, but it's something you'll have to access in your environment to be sure. Fast drives do much to alleviate this potential issue.

In SQL 2000 there was an additional option, but you had to look for it. If you visit SQLDev.Net you can download an extended stored procedure that will generate sequential (but still unique) uniqueidentifiers.

SQL 2005 offers a new option, a functional called NewSequentialID() that is guaranteed to generate a higher uniqueidentifier than previously generated on that machine. This effectively neutralizes the page splitting concern, but at a small cost - as noted in BOL because of the way they are calculated, it is possible someone could guess the next key. That point is worth remembering if you're exposing those ID's at all, for instance as a member id on a web site where you might be passing the ID via a URL parameter. The tradeoff is that this function ONLY works as a default on a column, you cannot call it directly.

There is another part of the performance story. Each non clustered index contains the clustered key. That's an additional four bytes per row if you're using an integer, but an additional sixteen bytes per row with a uniqueidentifier. That means you need more disk space and will be able to fit fewer rows on each index page, which equates to more page reads to use an index.

 Developers often benefit from being able to create a new key without making a round trip to a server. One scenario might be where you are letting someone build up a set of order details on the client and them submit them all at once. If you're using an identity column you have to insert the rows and then return the keys, if you're using a uniqueidentifier you can generate them on the client by calling the Windows CoCreateGuid API in VB6, or the System.Guid.NewGuid method of the .Net framework.

It's easy to get caught up in the potential performance negatives of uniqueidentifiers. Performance is an important part of the story to be sure, but not the entire story. Typically the two groups that will benefit most from using them are DBA's making heavy use of replication and developers that are trying to avoid round trips (which also impacts performance!).

I hope all of this gets you thinking about the use of uniqueidentifiers in general, and if page splitting was the reason for not using them before, maybe it's time to revisit that decision.

Total article views: 15744 | Views in the last 30 days: 5
Related Articles

Generate Uniqueidentifier with SSIS

If you are trying to generate Uniqueidentifier/Newid() in SSIS data flow, you will soon realize that...



Joining on Uniqueidentifiers


Clustered Index - UniqueIdentifiers

Changing UniqueIdentifier -->VarChar


Table Splitting

need better way for Splitting large table


Page split with Clustered index

Page split with Clustered index