RE: Lots of Key Lookups vs. UniqueIdentifier Clustered Index

SSC Guru

Points: 1003861

November 24, 2012 at 10:28 am

#1561983

Bhuvnesh (11/22/2012)
Jeff Moden (11/22/2012)
Either put the CI on the GUID with a decent FILL FACTOR to prevent rampant page splits during inserts
what would be that decent valure ? in our environment we use 75% , is it ok ? OR are there other deciding fator too for this ?

I don't know if 75% for your environment is OK or not because "It Depends" on the environment. For example, it depends on things like how many INSERTs there are vs how often you rebuild or reoganize the index. It also depends on how many pages of data you have especially for indexes that have a UUID for a leading column. Since they are so very random, it may be that you can make the FILL FACTOR larger because INSERTs will be distributed quite randomly and evenly if there are a lot of pages of data. The same page probably won't be hit multiple times in many cases. Having a larger FILL FACTOR is better for performance of SELECTs because fewer pages will need to be read.

Also, just to be sure, I wouldn't set the defaults for the server environment to 75%. I would only change an index from the default of "0/100" in situations like this or where a table may suffer a lot of updates to variable length columns.

Last but not least, I believe that sequential UUIDs would be a mistake here because it will hammer pages that are not at the logical "end" of the table. Here's the explanation from Books Online.

Creates a GUID that is greater than any GUID previously generated by this function on a specified computer since Windows was started. After restarting Windows, the GUID can start again from a lower range, but is still globally unique. When a GUID column is used as a row identifier, using NEWSEQUENTIALID can be faster than using the NEWID function. This is because the NEWID function causes random activity and uses fewer cached data pages. Using NEWSEQUENTIALID also helps to completely fill the data and index pages.

Notice the part where it says that a GUID sequence can start at a lower range after a Windows restart. That means that new GUIDs could go to the logical "middle" of a table which would cause severe page splitting either on a non-clustered index or on the data itself if it's a clustered index.

I also want to note that this entry in Books Online has a mistake on it. I can't but my hads on the URL right now, but even Microsoft has stated that, although the likely hood of it happening is quite small, GUIDs produced by most Microsoft Products are no longer guaranteed to be "globally unique" because they're no longer based on the Mac Address of the computer like the old Type 1 GUIDs where. They are now Type 4 GUIDs (for security reasons) and Type 4 GUIDs are nothing more than a fairly unpredictable, incredibly large domain, pseudo-random number (one machine won't repeat it until all of the values have been used). It is actually possible (though highly improbable) that two or more machines could end up producing a pair of matching GUIDs. Despite the low probability, I always put a UNIQUE index on GUID columns that will be used as key values (which you should do with any key column, BTW).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)