Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Lots of Key Lookups vs. UniqueIdentifier Clustered Index Expand / Collapse
Author
Message
Posted Sunday, November 18, 2012 2:07 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:09 AM
Points: 35,348, Visits: 31,886
Ok... Tom is correct. The "hack" is no good (I should have checked more deeply before I suggested it ). It looks mighty fine on the Execution Plan because it does return as a Clustered Index Seek but it only seeks to the logical "first" row in the table and then continues as a scan.

It's better to put indexes on the two columns and make sure that you have a good FILL FACTOR on the index for the GUID. The "Key Lookups" for queries against the GUID column are much more efficient than the false seek that the hack produces.

If you want to see for yourself, here's the code that I should have used before saying anything about the possible hack. Apologies again for suggesting the hack without further testing.

--===== Conditionally drop the test table to make reruns easier in SSMS
IF OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL
DROP TABLE #TestTable
;
--===== Create and populate the test table on-the-fly
SELECT TOP 1000000
SomeID = IDENTITY(INT,1,1),
SomeGUID = ISNULL(NEWID(),NEWID()), --ISNULL just to make the resulting column NOT NULL
SomeDate = ABS(CHECKSUM(NEWID()))%DATEDIFF(dd,'2000','2010')+CAST('2000' AS DATETIME),
SomeInt = ABS(CHECKSUM(NEWID()))%100000
INTO #TestTable
FROM master.sys.all_columns ac1
CROSS JOIN master.sys.all_columns ac2
;
--===== Create the clustered index featured in the "hack".
-- Used alone, this creates a very misleading Clustered Index Seek
-- which is really a full table scan in disguise.
CREATE UNIQUE CLUSTERED INDEX IX_XRef ON #TestTable (SomeID, SomeGUID)
;
--===== If you add this index, you get seeks with Key Lookups. If you
-- look at the IO statistics it produces, it's much more effecient
-- than the hack. Apologies for suggesting the hack without testing it.
--CREATE NONCLUSTERED INDEX [IX_SomeGuid]
--ON [dbo].[#TestTable] ([SomeGUID])
--;
--===== Select a GUID from somewhere in the table.
-- We're just getting a GUID that's know to exist in the table.
DECLARE @SomeGUID UNIQUEIDENTIFIER;
SELECT @SomeGUID = SomeGUID
FROM #TestTable
WHERE SomeID = 1
;
SET STATISTICS IO ON;
--===== Now, demonstrate that we can get a CI seek when looking
-- for a GUID using the ">" hack... BUT IT'S A FALSE SEEK
-- BECAUSE IT FINDS THE FIRST ROW WHICH HAS A SomeID > 0
-- AND THEN GOES INTO A SCAN MODE AS ATTESTED TO BY THE
-- IO STATISTICS.
SELECT SomeDate, SomeINT
FROM #TestTable
WHERE SomeID > 0
AND SomeGUID = @SomeGUID
;
--===== Now, demonstrate that we get a CI scan if we don't use
-- the ">" hack.
SELECT SomeDate, SomeINT
FROM #TestTable
WHERE SomeGUID = @SomeGUID
;
SET STATISTICS IO OFF;



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1386100
Posted Monday, November 19, 2012 5:10 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Yesterday @ 3:15 PM
Points: 53, Visits: 175
Thanks for all the comments. I have one more idea for addressing this. I'm thinking about partitioning my table. Everything that has already been created with the random UUID on one partition and all new, sequential UUIDs on a new partition.

The partition with the old UUIDs will get sorted when I convert it to a CI and there will never be any inserts into it again. All new inserts will go into the new partition where I will ensure the UUIDs are sequential.
Post #1386638
Posted Monday, November 19, 2012 10:04 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 7:06 PM
Points: 7,125, Visits: 12,720
dave-L (11/19/2012)
Thanks for all the comments. I have one more idea for addressing this. I'm thinking about partitioning my table. Everything that has already been created with the random UUID on one partition and all new, sequential UUIDs on a new partition.

The partition with the old UUIDs will get sorted when I convert it to a CI and there will never be any inserts into it again. All new inserts will go into the new partition where I will ensure the UUIDs are sequential.

Just curious, which column(s) are you evaluating as a partition key?

You mentioned the application sometimes generates UUIDs outside the database, how would you manage that piece of it?


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1386676
Posted Tuesday, November 20, 2012 3:44 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Monday, October 6, 2014 5:46 AM
Points: 2,840, Visits: 3,969
Jeff Moden (11/18/2012)
make sure that you have a good FILL FACTOR on the index for the GUID.
Please explain


-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done
Post #1386801
Posted Tuesday, November 20, 2012 9:20 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:09 AM
Points: 35,348, Visits: 31,886
Bhuvnesh (11/20/2012)
Jeff Moden (11/18/2012)
make sure that you have a good FILL FACTOR on the index for the GUID.
Please explain


Sure... it's a huge subject with lots of caveats that you need to read about in Books Online but here's a quick overview concerning GUIDs and FILL FACTORs.

GUIDs in SQL Server are really just huge random numbers (Type 4 GUIDs are used. Type 1 was used in version 7 but were still mostly random in nature). That means that if you have an index with a GUID in it (especially if it's the first column), new data will usually be inserted somewhere in the middle of a page on an index. If the page is full because of a 0 or 100 FILL FACTOR (or other reason), you get a page split on the index. Page splits use a lot of time and a lot of resources and can actually cause some pretty serious "timeouts" on frequently inserted tables. Lookup "page splits" in Books Online for more information.

Having a lower FILL FACTOR means that some free space is left on every page after and index rebuild or reorg. That free space is used by inserts instead of doing page splits (until the page is filled, of course).

An 80% FILL FACTOR (for example) will leave approximated 20% free space (depending on row size) on every page after an index rebuild or reorg. The rowsize thing is another good reason to keep columns narrow by using the correct datatype. More rows per page means faster SELECTs.

Fill factors other than 0 or 100% (0 is technically the same as 100), make SELECTs take longer because you have to read more pages to get the same amount of data so you need to be careful as to which indexes you assign a fill factor to and how big they are. Again, all of that is much better explained in Books Online under "Page Splits".

And, no... the new SEQUENTIAL GUID datatype won't help for individual inserts. You can read about that in Books Online, as well.



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1386996
Posted Tuesday, November 20, 2012 11:09 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Yesterday @ 3:15 PM
Points: 53, Visits: 175
opc.three:

Currently we have an integer identity, so I imagine I could draw a line in the sand and say all records with an identity of < X go to partition 1, otherwise partition 2.

Right now we do generate some of the UUIDs in the application layer. I imagine I'll have the application reach out to the dB for the UUID... not ideal, but I can't think of a better way to do this.

For generating sequential GUIDs I'm using a sequence and newID() and mashing them together. I do this instead of using newSequentialID() because I need some randomness in my UUID.

This is what I've come up with for getting the randomized, sequential UUID:

CREATE FUNCTION makeGUID
(
@guid AS UNIQUEIDENTIFIER,
@number AS BIGINT
)
RETURNS UNIQUEIDENTIFIER
AS
BEGIN
RETURN CAST(LEFT(@guid, 24) + RIGHT(CONVERT(VARCHAR(16), CAST(@number AS VARBINARY(50)), 2), 12) AS UNIQUEIDENTIFIER)
END
GO

A couple of things I don't like about it. One, I'd rather use bitwise operators to mash these values up rather than string functions but the data is too wide (128 bits).

Also, I have to pass newID() and NEXT VALUE FOR in as parameters because I can't seem to find a tricky way of using them in a UDF.
Post #1387071
Posted Tuesday, November 20, 2012 7:27 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 7:06 PM
Points: 7,125, Visits: 12,720
Why do you need randomness in your UUID?

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1387206
Posted Wednesday, November 21, 2012 11:22 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Yesterday @ 3:15 PM
Points: 53, Visits: 175
Our web-based app exposes these IDs in the URL. Although their is additional security checking when executing queries based on the passed in URL UUID, in our shop it is generally viewed as additional security to have these randomized to some extent to prevent the end user from making simple changes and getting the "next" record.

After reading further on Books Online (thanks @Jeff Moden), it appears that NEWSEQUENTIALID() doesn't provide sequential UUIDs between restarts, so I have to build some sort of function for sequential UUIDs, whether or not I require them to be random-sequential or not.

I did clean up my function a bit, removing variable length types, implicit casting, tightening up the lengths and removing some of the string manipulation:

RETURN CAST(LEFT(CAST(@guid AS CHAR(36)), 24) + CONVERT(CHAR(16), CAST(@Number AS BINARY(6)), 2) AS UNIQUEIDENTIFIER)
Post #1387537
Posted Wednesday, November 21, 2012 11:32 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 7:06 PM
Points: 7,125, Visits: 12,720
dave-L (11/21/2012)
Our web-based app exposes these IDs in the URL. Although their is additional security checking when executing queries based on the passed in URL UUID, in our shop it is generally viewed as additional security to have these randomized to some extent to prevent the end user from making simple changes and getting the "next" record.

After reading further on Books Online (thanks @Jeff Moden), it appears that NEWSEQUENTIALID() doesn't provide sequential UUIDs between restarts, so I have to build some sort of function for sequential UUIDs, whether or not I require them to be random-sequential or not.

I did clean up my function a bit, removing variable length types, implicit casting, tightening up the lengths and removing some of the string manipulation:

RETURN CAST(LEFT(CAST(@guid AS CHAR(36)), 24) + CONVERT(CHAR(16), CAST(@Number AS BINARY(6)), 2) AS UNIQUEIDENTIFIER)

Uggh, you have been dealt a tough hand. You are spot on with regard to the "prediction" security issue, and on the "restart issue" with NEWSEQUENTIALID() which was the motivation behind my question about randomness and which partitioning key you would choose. Even if you employed partitioning there is a chance that after a restart NEWSEQUENTIALID() could theoretically spill into your "frozen partition." I think you are on the right track developing your own sequential (yet random ) UUID generator.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1387546
Posted Wednesday, November 21, 2012 11:40 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Yesterday @ 3:15 PM
Points: 53, Visits: 175
Thanks for your help opc.three.

One last thing: I hate to leave less than optimal code out there for posterity so here is my last, optimized function with all string manipulation removed

CREATE FUNCTION makeGUID
(
@guid AS UNIQUEIDENTIFIER,
@number AS BIGINT
)
RETURNS UNIQUEIDENTIFIER
AS
BEGIN
RETURN CAST(CAST(@guid AS BINARY(10)) + CAST(@Number AS BINARY(6)) AS UNIQUEIDENTIFIER)
END
GO
Post #1387554
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse