Clustered indexes question

  • Hi,

    I've a table whose primary key is a GUID column... Is it worth having a clustered index over that primary key or, since guids are "random", will that only make the data change "move" a lot on disk? This table has a lot o writes and reads!

    Thanks in advance,

    Pedro



    If you need to work better, try working less...

  • if you need you may stay play with this index - but you should set a proper fillfactor and maybe padindex;

    if you dont need clustered index there - you should use it as a nonclustered index

  • Hi Pedro

    Why do you need a GUID as primary key? SQL Server works much better with INT primary keys.

    If you really need a GUID primary key, you should avoid to define it as clustered index. The clustered index defines the structure of table and is used by other indexes. Since a GUID is not sequential it causes many page splits. As Marcin wrote, you can use a less FILLFACTOR which reduces the problem but doesn't eliminate it. A better solution would be another IDENTITY column which represents the clustered index.

    SQL Server 2005 introduced the NEWSEQUENTIALID() function as default value for your UNIQUEIDENTIFIER column. The NEWSEQUENTIALID() method creates an sequential, valid GUID which works much better as clustered index. Anyway, it's still less performing than a INT column and (afaik) there is no function like SCOPE_IDENTITY() to get the last generated id. You have to use the OUTPUT clause to get the last generated ids into a temporary table or any other destination.

    Greets

    Flo

  • Hi,

    We use GUIDs so that records have diferent ids on all databases we have.

    Some records can by synchronized across servers (like replication) and we needed an unique key.

    Yeah, I know we could have used a combined key (Id - bigint; ServerId - bigint) but the project was already in development when I can here and know GUID's are a cross I have to carry and deal with :S.

    Is it a good idea not to have a CLUSTERED INDEX, since the PK uses GUIDs, or should I create one using an Id column, BIGINT - IDENTITY(1,1), (used only for the clustered index)?

    Thanks,

    Pedro



    If you need to work better, try working less...

  • PiMané (9/8/2009)


    Is it a good idea not to have a CLUSTERED INDEX, since the PK uses GUIDs, or should I create one using an Id column, BIGINT - IDENTITY(1,1), (used only for the clustered index)?

    There are a few reasons for tables without a clustered index (e.g. some special kinds of staging tables) but in most cases a table should have a clustered index. If your table has no clustered index other indexes will run into problems when your data become changed.

    My suggestion is to use a (BIG)INT IDENTITY column as clustered index.

  • Thanks,

    Since I'm creating this column now, when the table is created and already has data, is there a problem for this column to be the last one on the list, or should I create a temp table, move the data into it, drop the old table and rename the new table (like Management Studio does when you insert a column in the middle of others)?

    I've read somewhere that "fixed" size types (int, char, bigint, ...) should always be the first columns and nvarchar, image, ... should be the last ones. Is this true or SQL Server manages this problem (nvarchar growing causing page "breaks" and data having to move to another page)?

    Thanks once again,

    Pedro



    If you need to work better, try working less...

  • I don't think that this is a problem since the index tree is maintained without the data structure and points to the row, not to the column.

    Probably here are storage engine gurus with a better knowledge than me for this question.

  • All the testing I've done with GUID's as a clustered index have shown some serious performance problems. The one thing that alleviates this somewhat is to use the SEQUENTIALID mechanism. There are valid reasons for using these things as PK's, but it really is a bit of a balancing act to get them to work well.

    Keep an eye on fragmentation. It's likely to be a bit higher. You might want to look into using index compression to reduce the number of pages you use. I think that'll help (of course, I think that helps in general).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Oh, and yes, you should have a clustered index on most of your tables (most being 99.9999999999%) or have a very good reason why not, with testing and good documentation. SQL Server is built and optimized around storing data in a clustered index. It's just a good practice to follow. Yes, there are exceptions, but the exceptions should be exceptional (I love saying that).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Should I have the clustered index even if it's on a "ghost" column, like in my case the column won't be used for anything but the clustered index?

    Right now the table looks like this:

    Id - GUID

    ClientId - GUID

    Data - XML

    The primary key is on Id and ClientId, beeing a clustered index.

    Should I just create the column RowId - BIGINT IDENTITY(1,1) and create a clustered index on that column and create a non clustered index on Id and ClientId to keep the "connections" to the other tables?

    PS: I've got a vertical database, each table has 3 other tables associated to store strings, numbers, datetime values, "connected" by the Id and ClientId.

    Thanks,

    Pedro



    If you need to work better, try working less...

  • PiMané (9/8/2009)


    Should I have the clustered index even if it's on a "ghost" column, like in my case the column won't be used for anything but the clustered index?

    Right now the table looks like this:

    Id - GUID

    ClientId - GUID

    Data - XML

    The primary key is on Id and ClientId, beeing a clustered index.

    Should I just create the column RowId - BIGINT IDENTITY(1,1) and create a clustered index on that column and create a non clustered index on Id and ClientId to keep the "connections" to the other tables?

    PS: I've got a vertical database, each table has 3 other tables associated to store strings, numbers, datetime values, "connected" by the Id and ClientId.

    Thanks,

    Pedro

    No, I wouldn't go that far. You'll end up with lots of key lookups which can be very bad. There has to be a common access path, a column or set of columns that are the most frequently used for pulling the data. That's generally where the cluster should go. Best for performance is a monotonically increasing integer value, an identity, but it's not the only option. It's just that GUID's are a bit large and you're better off keeping the cluster, or any other index, as small as you can. But note, that's as small as you can, not as small as possible. If the GUID is the single point of entry into the table, or the most common point of entry into the table, then it may be the best bet for your cluster. You just need to be aware that it comes with baggage.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Unfortunately the Guid's are the only link between the data tables.

    The big problem I've got with the Guid is that it's the application that sends it and I can't use NEWSEQUENTIALID(), and that will generate a lot of page breaks...

    But since the "link" between the tables are the 2 Guid columns I don't have other choice than use them as a Clustered Index... I could tell the programmers to change the program but they probably wouldn't like it :D...

    Thanks again,

    Pedro



    If you need to work better, try working less...

  • This was removed by the editor as SPAM

  • The data, since we have lots of customers, will be sorted by the ClientId and the Id (record Id), both GUID's (and I can't get away from this, I wish but the application is made from to to bottom using this structure). The isn't much knowledge on how to sort the records since the users can sort the data by any column (click on top of the grid on the column name)... but since this is the "link" between the tables, it seems to be the best option.

    Pedro



    If you need to work better, try working less...

  • This was removed by the editor as SPAM

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply