Question about Clusterd indexes

  • I Know you shouldn't do it, but can anybody tell me why you shouldnt use a UniqueIdentifier as a CLustered key.

    Personally I always thought the clusterd index is the way the records are physically orderd on disk.

    So if you have to insert a record on the first page every thing has to shift. ( I know depending on the fill factor)

    But does everything really shift or is there just a page split for the first page.

  • Because the value produced for a uniqueidentifier is not sequential and data would be randomly inserted in the table.

    For example run

    SELECT NEWID()

    several times and you will see the order will cause data to disperse throughout the table and thus cause a lot of overhead in a clustered index.

  • Remember that the page split will only occured once the page is full not with every insert. To delayed this rememeber the fillfactor thing and also consider with pad index, to leave free space in the non leafe pages also.

  • But again there is no logical order to the data. The key to a cluster should always be the logical order. A datetime is good because you want the data to be logically order by occurrance. An identity is good because it is order of insert occurance. A varchar column for last name is good especially when combined with a varchar column for firstname when building a telphone book for example. But uniqueindentifiers will not create any logical ordering of the data whatsoever, it will just have a unique identity for finding the record.

  • I know this all.

    But look at the following code....

    DECLARE @STARTTIME DATETIME

    DECLARE @ENDTIME DATETIME

    DECLARE @LOOP AS INTEGER

    DECLARE @MAXINSERTS AS INTEGER

    DROP TABLE TESTTABLE

    CREATE TABLE TESTTABLE

    ( --ID UNIQUEIDENTIFIER DEFAULT NEWID(),

    ID INT IDENTITY,

    FIELD1 CHAR(255),

    FIELD2 CHAR(255),

    FIELD3 CHAR(255),

    FIELD4 CHAR(255),

    FIELD5 CHAR(255),

    FIELD6 CHAR(255),

    FIELD7 CHAR(255),

    FIELD8 CHAR(255),

    FIELD9 CHAR(255),

    FIELD10 CHAR(255))

    CREATE UNIQUE CLUSTERED INDEX CLUST ON TESTTABLE(ID)

    CREATE INDEX IND1 ON TESTTABLE (FIELD1)

    CREATE INDEX IND2 ON TESTTABLE (FIELD2)

    CREATE INDEX IND3 ON TESTTABLE (FIELD3)

    CREATE INDEX IND4 ON TESTTABLE (FIELD4)

    CREATE INDEX IND5 ON TESTTABLE (FIELD5)

    CREATE INDEX IND6 ON TESTTABLE (FIELD6)

    CREATE INDEX IND7 ON TESTTABLE (FIELD7)

    CREATE INDEX IND8 ON TESTTABLE (FIELD8)

    CREATE INDEX IND9 ON TESTTABLE (FIELD9)

    CREATE INDEX IND10 ON TESTTABLE (FIELD10)

    SET @MAXINSERTS = 1000

    SET @STARTTIME = GETDATE()

    SET @LOOP = 1

    WHILE @LOOP < @MAXINSERTS

    BEGIN

    INSERT TESTTABLE DEFAULT VALUES

    SET @LOOP = @LOOP + 1

    END

    SET @ENDTIME= GETDATE()

    SELECT DATEDIFF(MS,@STARTTIME,@ENDTIME)

    And my question is ...........

    Why does the GUID is 4 times slower than the int ...

    Is this due to DISK IO or page SPLITS or really due to the overhead in a 4 to 16 byte variable.

    Can anybody tell me if the table is physically reorganised every time ?

  • From BOL

    There is no way to determine the sequence in which uniqueidentifier values were generated. They are not suited for existing applications that depend on serially incrementing key values.

    At 16 bytes, the uniqueidentifier data type is relatively large compared to other data types such as 4-byte integers. This means indexes built using uniqueidentifier keys may be relatively slower than implementing the indexes using an int key.

    In my experience clustered indexes work best when you are likely to have multiple ideentical values (for example dates) and gives a nice way of logical data-ordering

    I have found it is possible to please all of the people all of the time if you do exactly what they want. Harold Macmillan 1961


    I have found it is possible to please all of the people all of the time if you do exactly what they want. Harold Macmillan 1961

  • Thanks crosspatch i know that one from the BOL. ( I wished every programming book would start with that phrase )

    But the real question is WHY is it slower,

    does the complete table shift when a record is inserted at the beginning ????

    Or is just the first page split ????

    what is happening in such a situation.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply