A Look at GUIDs

, 2006-07-27

Uniqueidentifiers are a useful tool, but not one that is universally loved.

I'm going to start by covering some background, then talk about how they can be

used in both SQL 2000 & 2005, and finish with some thoughts about why

uniqueidentifiers are of special interest to developers doing data access.

Let's start with some terminology. What we SQL people call uniqueidentifiers

most people call Globally Unique Identifiers (GUID's). It's a sixteen byte

number generated using a combination of different data elements (commonly the

system time and a portion of the network card MAC address, but there are

variations) in a way that is supposed to guarantee it will be unique. You can

read a slightly more detailed overview on

Wikipedia or get into some

serious detail with this never quite finished

Internet Draft (which notes that the generation algorithm supports key

generation rates as high as ten million per second!).

Most you are familiar with identity (auto number) columns. If you're in a

situation where you need to easily generate a unique key they are very handy,

but can be troublesome in a couple situations. One is where as a developer you

find it useful to be able to generate a new key on the client without having to

make a round trip to the server. Another is when using replication, inserting

rows into an updatable subscriber require some extra work to manage identity key

ranges. Uniqueidentifiers easily address both of those issues, with the

corresponding tradeoff that they are harder to read (they look like this:

5da3c440-0a40-11db-9cd8-0800200c9a66) and they require sixteen bytes of storage,

where identities are usually stored as four byte integers.

In SQL 2000 & 2005, it's easy to generate a uniqueidentifier, just call the

NewID() function. For example, this would return a new id:

select newid()

Nothing complicated about that. It's also possible to use it as a default on

a column so that any row inserted will have a key generated automatically.

There is one additional complication on the database side. While the

uniqueidentifiers are guaranteed unique, they are not guaranteed to be

sequential. That means as you generate new uniqueidentifiers as primary keys and

insert them into a table there is a much higher chance you'll cause a page

split. SQL stores data on 8k pages, a page split occurs when there is not enough

room on the page for the new row, so the page is basically split into two pages

so that the insert can success. Page splits in general are not bad, it's the

frequency of page splits that can have an impact on performance as each split

requires additional locking and disk IO. In my view the worry about page splits

is overblown, but it's something you'll have to access in your environment to be

sure. Fast drives do much to alleviate this potential issue.

In SQL 2000 there was an additional option, but you had to look for it. If

you visit SQLDev.Net you can

download an extended stored procedure that will generate sequential (but still

unique) uniqueidentifiers.

SQL 2005 offers a new option, a functional called NewSequentialID() that is

guaranteed to generate a higher uniqueidentifier than previously generated on

that machine. This effectively neutralizes the page splitting concern, but at a

small cost - as noted in BOL because of the way they are calculated, it is

possible someone could guess the next key. That point is worth remembering if

you're exposing those ID's at all, for instance as a member id on a web site

where you might be passing the ID via a URL parameter. The tradeoff is that this

function ONLY works as a default on a column, you cannot call it directly.

There is another part of the performance story. Each non clustered index

contains the clustered key. That's an additional four bytes per row if you're

using an integer, but an additional sixteen bytes per row with a

uniqueidentifier. That means you need more disk space and will be able to fit

fewer rows on each index page, which equates to more page reads to use an index.

 Developers often benefit from being able to create a new key without

making a round trip to a server. One scenario might be where you are letting

someone build up a set of order details on the client and them submit them all

at once. If you're using an identity column you have to insert the rows and then

return the keys, if you're using a uniqueidentifier you can generate them on the

client by calling the Windows CoCreateGuid API in VB6, or the

System.Guid.NewGuid method of the .Net framework.

It's easy to get caught up in the potential performance negatives of

uniqueidentifiers. Performance is an important part of the story to be sure, but

not the entire story. Typically the two groups that will benefit most from using

them are DBA's making heavy use of replication and developers that are trying to

avoid round trips (which also impacts performance!).

I hope all of this gets you thinking about the use of uniqueidentifiers in

general, and if page splitting was the reason for not using them before, maybe

it's time to revisit that decision.

Rate

4.75 (4)

Share

Share

Rate

4.75 (4)

Related content

New MDX Book Published

Fast Track to MDX gives readers all the necessary background information needed to write useful, powerful MDX expressions and introduces the most frequently used MDX functions and constructs. No prior knowledge is assumed and examples are used throughout the book to rapidly develop MDX skills to the point where a reader can solve real business problems. A CD containing examples from within the book, and a time-limited version of ProClarity, is included

2003-05-01

2,698 reads

Default Values and Named Parameters for Stored Procs

Are you using default values for your parameters? Using named parameters when you call the proc or passing the values by ordinal? Should you be? Andy thinks 6 out of 10 of our readers will agree with his point of view, we'll be a little more conservative and guess that 5 of out 10 will be closer.

5 (1)

2003-01-08

8,170 reads