Question about Clusterd indexes

Question

Post reply

Question about Clusterd indexes

well0549

SSCrazy

Points: 2259
More actions
January 20, 2003 at 5:49 am

#58230

I Know you shouldn't do it, but can anybody tell me why you shouldnt use a UniqueIdentifier as a CLustered key.
Personally I always thought the clusterd index is the way the records are physically orderd on disk.
So if you have to insert a record on the first page every thing has to shift. ( I know depending on the fill factor)
But does everything really shift or is there just a page split for the first page.

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Antares686 SSC Guru Points: 125444 More actions · Answer 1

Because the value produced for a uniqueidentifier is not sequential and data would be randomly inserted in the table.

For example run

SELECT NEWID()

several times and you will see the order will cause data to disperse throughout the table and thus cause a lot of overhead in a clustered index.

Rodrigo Acosta SSC-Insane Points: 21261 More actions · Answer 2

Remember that the page split will only occured once the page is full not with every insert. To delayed this rememeber the fillfactor thing and also consider with pad index, to leave free space in the non leafe pages also.

Antares686 SSC Guru Points: 125444 More actions · Answer 3

But again there is no logical order to the data. The key to a cluster should always be the logical order. A datetime is good because you want the data to be logically order by occurrance. An identity is good because it is order of insert occurance. A varchar column for last name is good especially when combined with a varchar column for firstname when building a telphone book for example. But uniqueindentifiers will not create any logical ordering of the data whatsoever, it will just have a unique identity for finding the record.

well0549 SSCrazy Points: 2259 More actions · Answer 4

I know this all.

But look at the following code....

DECLARE @STARTTIME DATETIME

DECLARE @ENDTIME DATETIME

DECLARE @LOOP AS INTEGER

DECLARE @MAXINSERTS AS INTEGER

DROP TABLE TESTTABLE

CREATE TABLE TESTTABLE

( --ID UNIQUEIDENTIFIER DEFAULT NEWID(),

ID INT IDENTITY,

FIELD1 CHAR(255),

FIELD2 CHAR(255),

FIELD3 CHAR(255),

FIELD4 CHAR(255),

FIELD5 CHAR(255),

FIELD6 CHAR(255),

FIELD7 CHAR(255),

FIELD8 CHAR(255),

FIELD9 CHAR(255),

FIELD10 CHAR(255))

CREATE UNIQUE CLUSTERED INDEX CLUST ON TESTTABLE(ID)

CREATE INDEX IND1 ON TESTTABLE (FIELD1)

CREATE INDEX IND2 ON TESTTABLE (FIELD2)

CREATE INDEX IND3 ON TESTTABLE (FIELD3)

CREATE INDEX IND4 ON TESTTABLE (FIELD4)

CREATE INDEX IND5 ON TESTTABLE (FIELD5)

CREATE INDEX IND6 ON TESTTABLE (FIELD6)

CREATE INDEX IND7 ON TESTTABLE (FIELD7)

CREATE INDEX IND8 ON TESTTABLE (FIELD8)

CREATE INDEX IND9 ON TESTTABLE (FIELD9)

CREATE INDEX IND10 ON TESTTABLE (FIELD10)

SET @MAXINSERTS = 1000

SET @STARTTIME = GETDATE()

SET @LOOP = 1

WHILE @LOOP < @MAXINSERTS

BEGIN

INSERT TESTTABLE DEFAULT VALUES

SET @LOOP = @LOOP + 1

END

SET @ENDTIME= GETDATE()

SELECT DATEDIFF(MS,@STARTTIME,@ENDTIME)

And my question is ...........

Why does the GUID is 4 times slower than the int ...

Is this due to DISK IO or page SPLITS or really due to the overhead in a 4 to 16 byte variable.

Can anybody tell me if the table is physically reorganised every time ?

crosspatch Hall of Fame Points: 3168 More actions · Answer 5

From BOL

There is no way to determine the sequence in which uniqueidentifier values were generated. They are not suited for existing applications that depend on serially incrementing key values.

At 16 bytes, the uniqueidentifier data type is relatively large compared to other data types such as 4-byte integers. This means indexes built using uniqueidentifier keys may be relatively slower than implementing the indexes using an int key.

In my experience clustered indexes work best when you are likely to have multiple ideentical values (for example dates) and gives a nice way of logical data-ordering

I have found it is possible to please all of the people all of the time if you do exactly what they want. Harold Macmillan 1961

well0549 SSCrazy Points: 2259 More actions · Answer 6

Thanks crosspatch i know that one from the BOL. ( I wished every programming book would start with that phrase )

But the real question is WHY is it slower,

does the complete table shift when a record is inserted at the beginning ????

Or is just the first page split ????

what is happening in such a situation.