Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««12345»»

Lots of Key Lookups vs. UniqueIdentifier Clustered Index Expand / Collapse
Author
Message
Posted Thursday, November 22, 2012 7:37 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 3:39 PM
Points: 37,101, Visits: 31,653
dave-L (11/22/2012)
Hi Jeff. That's my whole question really. Is there a good way to fix this and is it worth trying to fix.

True using a UDF to generate the UUIDs will have more over head than native NEWID(), but my thought was I would save so much time on the scores of thousands of key lookups that occur because my CI is on a useless identity that this would be worth it.

Dave


I'm still confused about it all. I thought you said that the GUID from the GUI was absolutely required and has been used in thousands of places throughout the code. The only way that you'd get rid of the Key Lookups is to do one of two things. Either put the CI on the GUID with a decent FILL FACTOR to prevent rampant page splits during inserts or build a covering index with the GUID as the leading column. All of this other work will do nothing to get rid of the Key Lookups.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1388006
Posted Thursday, November 22, 2012 11:38 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 12:26 AM
Points: 2,840, Visits: 3,963
Jeff Moden (11/22/2012)
Either put the CI on the GUID with a decent FILL FACTOR to prevent rampant page splits during inserts
what would be that decent valure ? in our environment we use 75% , is it ok ? OR are there other deciding fator too for this ?


-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done
Post #1388049
Posted Saturday, November 24, 2012 3:18 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 8:30 AM
Points: 8,835, Visits: 9,393
CELKO (11/22/2012)
My question is: is it better to have all of these key lookups happening or would it be better to just build the clustered indexes on the UUIDs?


Neither. This nightmare is how non-RDBMS people design schemas to fake their familar pointer chains. Neither UUID (intented for web apps only) or IDENTITY (physical record insertion count for one file on one machine) are keys by definition.

Stuff and nonsense! You clearly haven't read all the information in this thread. There is a clear statement of a requirement for unpredictability in the key (and a key it most certainly is, because it is what an end user uses to identify the row containing it, conforming exactly to Codd's original definition of a key - and it's because it identifies a row that unpredictability is required).

This unpredictability (randomness) requirement is extremely common in applications that use an RDBMS to store personal information (whether they are web applications or not - unpredictable keys are somewhat older than the web). An index like this one is a component of the security wall that maintains privacy. Of course usually isn't the only such component, but for security and privacy each small addition to an attacker's problems is valuable.


Tom
Post #1388296
Posted Saturday, November 24, 2012 9:04 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Monday, September 15, 2014 7:19 PM
Points: 7,127, Visits: 12,655
Jeff Moden (11/22/2012)
dave-L (11/22/2012)
Hi Jeff. That's my whole question really. Is there a good way to fix this and is it worth trying to fix.

True using a UDF to generate the UUIDs will have more over head than native NEWID(), but my thought was I would save so much time on the scores of thousands of key lookups that occur because my CI is on a useless identity that this would be worth it.

Dave


I'm still confused about it all. I thought you said that the GUID from the GUI was absolutely required and has been used in thousands of places throughout the code. The only way that you'd get rid of the Key Lookups is to do one of two things. Either put the CI on the GUID with a decent FILL FACTOR to prevent rampant page splits during inserts or build a covering index with the GUID as the leading column. All of this other work will do nothing to get rid of the Key Lookups.

I think where things are going is that if the GUID is made to be sequential then the CI can be again redefined using the GUID resulting in the key lookups going away and fragmentation due to mid-page page splits also going away meaning the FILLFACTOR does not need to be modified solely to ease things up for a non-sequential GUID.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1388313
Posted Saturday, November 24, 2012 10:28 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 3:39 PM
Points: 37,101, Visits: 31,653
Bhuvnesh (11/22/2012)
Jeff Moden (11/22/2012)
Either put the CI on the GUID with a decent FILL FACTOR to prevent rampant page splits during inserts
what would be that decent valure ? in our environment we use 75% , is it ok ? OR are there other deciding fator too for this ?


I don't know if 75% for your environment is OK or not because "It Depends" on the environment. For example, it depends on things like how many INSERTs there are vs how often you rebuild or reoganize the index. It also depends on how many pages of data you have especially for indexes that have a UUID for a leading column. Since they are so very random, it may be that you can make the FILL FACTOR larger because INSERTs will be distributed quite randomly and evenly if there are a lot of pages of data. The same page probably won't be hit multiple times in many cases. Having a larger FILL FACTOR is better for performance of SELECTs because fewer pages will need to be read.

Also, just to be sure, I wouldn't set the defaults for the server environment to 75%. I would only change an index from the default of "0/100" in situations like this or where a table may suffer a lot of updates to variable length columns.

Last but not least, I believe that sequential UUIDs would be a mistake here because it will hammer pages that are not at the logical "end" of the table. Here's the explanation from Books Online.

Creates a GUID that is greater than any GUID previously generated by this function on a specified computer since Windows was started. After restarting Windows, the GUID can start again from a lower range, but is still globally unique. When a GUID column is used as a row identifier, using NEWSEQUENTIALID can be faster than using the NEWID function. This is because the NEWID function causes random activity and uses fewer cached data pages. Using NEWSEQUENTIALID also helps to completely fill the data and index pages.


Notice the part where it says that a GUID sequence can start at a lower range after a Windows restart. That means that new GUIDs could go to the logical "middle" of a table which would cause severe page splitting either on a non-clustered index or on the data itself if it's a clustered index.

I also want to note that this entry in Books Online has a mistake on it. I can't but my hads on the URL right now, but even Microsoft has stated that, although the likely hood of it happening is quite small, GUIDs produced by most Microsoft Products are no longer guaranteed to be "globally unique" because they're no longer based on the Mac Address of the computer like the old Type 1 GUIDs where. They are now Type 4 GUIDs (for security reasons) and Type 4 GUIDs are nothing more than a fairly unpredictable, incredibly large domain, pseudo-random number (one machine won't repeat it until all of the values have been used). It is actually possible (though highly improbable) that two or more machines could end up producing a pair of matching GUIDs. Despite the low probability, I always put a UNIQUE index on GUID columns that will be used as key values (which you should do with any key column, BTW).


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1388316
Posted Saturday, November 24, 2012 11:08 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 3:39 PM
Points: 37,101, Visits: 31,653
opc.three (11/24/2012)
Jeff Moden (11/22/2012)
dave-L (11/22/2012)
Hi Jeff. That's my whole question really. Is there a good way to fix this and is it worth trying to fix.

True using a UDF to generate the UUIDs will have more over head than native NEWID(), but my thought was I would save so much time on the scores of thousands of key lookups that occur because my CI is on a useless identity that this would be worth it.

Dave


I'm still confused about it all. I thought you said that the GUID from the GUI was absolutely required and has been used in thousands of places throughout the code. The only way that you'd get rid of the Key Lookups is to do one of two things. Either put the CI on the GUID with a decent FILL FACTOR to prevent rampant page splits during inserts or build a covering index with the GUID as the leading column. All of this other work will do nothing to get rid of the Key Lookups.

I think where things are going is that if the GUID is made to be sequential then the CI can be again redefined using the GUID resulting in the key lookups going away and fragmentation due to mid-page page splits also going away meaning the FILLFACTOR does not need to be modified solely to ease things up for a non-sequential GUID.


I think that using sequetial GUIDs would be a very bad idea for this because of what I said in my previous post above. Restarting Windows can cause the sequence to restart at a lower value than the previous sequence which would cause new rows to be inserted into the logical "middle" ("begining" on the first restart) of the table if the CI were on that column. Since they would again be sequential, page splits would be rampant right after the restart. I don't believe they'd settle down much after that, either. I would be an interesting test over time.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1388320
Posted Saturday, November 24, 2012 2:15 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Monday, September 15, 2014 7:19 PM
Points: 7,127, Visits: 12,655
Jeff Moden (11/24/2012)
opc.three (11/24/2012)
Jeff Moden (11/22/2012)
dave-L (11/22/2012)
Hi Jeff. That's my whole question really. Is there a good way to fix this and is it worth trying to fix.

True using a UDF to generate the UUIDs will have more over head than native NEWID(), but my thought was I would save so much time on the scores of thousands of key lookups that occur because my CI is on a useless identity that this would be worth it.

Dave


I'm still confused about it all. I thought you said that the GUID from the GUI was absolutely required and has been used in thousands of places throughout the code. The only way that you'd get rid of the Key Lookups is to do one of two things. Either put the CI on the GUID with a decent FILL FACTOR to prevent rampant page splits during inserts or build a covering index with the GUID as the leading column. All of this other work will do nothing to get rid of the Key Lookups.

I think where things are going is that if the GUID is made to be sequential then the CI can be again redefined using the GUID resulting in the key lookups going away and fragmentation due to mid-page page splits also going away meaning the FILLFACTOR does not need to be modified solely to ease things up for a non-sequential GUID.


I think that using sequetial GUIDs would be a very bad idea for this because of what I said in my previous post above. Restarting Windows can cause the sequence to restart at a lower value than the previous sequence which would cause new rows to be inserted into the logical "middle" ("begining" on the first restart) of the table if the CI were on that column. Since they would again be sequential, page splits would be rampant right after the restart. I don't believe they'd settle down much after that, either. I would be an interesting test over time.

I agree 100% with regard to NEWSEQUENTIALID(). It is not a good option.

However the OP is talking about producing their own function to generate sequential GUIDs that would not be affected by a restart. The tradeoff of course is that all GUID-generation would require a roundtrip to the database which has its own set of baggage.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1388331
Posted Saturday, November 24, 2012 2:30 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 3:39 PM
Points: 37,101, Visits: 31,653
opc.three (11/24/2012)

However the OP is talking about producing their own function to generate sequential GUIDs that would not be affected by a restart. The tradeoff of course is that all GUID-generation would require a roundtrip to the database which has its own set of baggage.


Considering that the OP has also identified that the existing GUIDs have been used in "thousands of lines of code", I'm hoping that he won't consider such a thing. It would even break GUI code unless they write a wad of "Instead Of" triggers to covert the UUIDs the GUI code is already passing.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1388332
Posted Saturday, November 24, 2012 2:59 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Monday, September 15, 2014 7:19 PM
Points: 7,127, Visits: 12,655
Jeff Moden (11/24/2012)
opc.three (11/24/2012)

However the OP is talking about producing their own function to generate sequential GUIDs that would not be affected by a restart. The tradeoff of course is that all GUID-generation would require a roundtrip to the database which has its own set of baggage.


Considering that the OP has also identified that the existing GUIDs have been used in "thousands of lines of code", I'm hoping that he won't consider such a thing. It would even break GUI code unless they write a wad of "Instead Of" triggers to covert the UUIDs the GUI code is already passing.

Obviously I cannot see this particular codebase but for the ones I have worked with where this technique was employed the GUID generation was typically centralized so rewriting that piece (or those pieces) of code would have been trivial. Getting a new GUID for purposes of adding new rows to a database is usually a small percentage of the work the application would need to do. The 'get' logic, i.e. the joining of existing GUID-columns, usually comprises the vast majority of application logic but hopefully none of that will need to change.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1388338
Posted Saturday, November 24, 2012 4:32 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 3:39 PM
Points: 37,101, Visits: 31,653
Wouldn't it just be easier and safer to set the FILL FACTOR appropriately and make sure that reindexing was occuring on a regular basis? That would't require any code changes and, therefor, no regression testing.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1388345
« Prev Topic | Next Topic »

Add to briefcase «««12345»»

Permissions Expand / Collapse