Clustered index, mandatory or optional?!

  • Hi,

    I'm reviewing our database tables and found some "bad" practices like Clustered index with Uniqueidentifier columns using NewId - support for SQL2005 is mandatory.

    I was woundering, and already have a post on natural keys vs surrogate keys, since there's no column on the table that can be sequential, customerCode is string and starts with the state followed by a number, name is also no sequential, .... There's no real good column for clustering and the one that is currently used is GUID.

    I can't get rid of the column since it's used as FK and the tables are replicated/synchronized between similar databases. Is it best to remove the clustered index and create it as not clustered or adding an identity column clould be an option but knowing it wouldn't be used for anything else but clustered index?

    Thanks,

    Pedro



    If you need to work better, try working less...

  • PiMané (10/6/2012)


    Hi,

    I'm reviewing our database tables and found some "bad" practices like Clustered index with Uniqueidentifier columns using NewId - support for SQL2005 is mandatory.

    I was woundering, and already have a post on natural keys vs surrogate keys, since there's no column on the table that can be sequential, customerCode is string and starts with the state followed by a number, name is also no sequential, .... There's no real good column for clustering and the one that is currently used is GUID.

    I can't get rid of the column since it's used as FK and the tables are replicated/synchronized between similar databases. Is it best to remove the clustered index and create it as not clustered or adding an identity column clould be an option but knowing it wouldn't be used for anything else but clustered index?

    Thanks,

    Pedro

    I would almost rather have just about anything besides a UNIQUEIDENTIFIER as a clustering key. How is the data in the table mostly queried? How wide is that customerCode and is that an oft-used column to increase selectivity? Even if it is not unique, if it is narrow (enough) and static it may be a better choice. It's impossible to say without knowing the access pattern for the data.

    Some guidance: Ever-increasing clustering key - the Clustered Index Debate..........again! by Kim Tripp

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Basically the PK cloud be an IDENTITY used for lookups (it's unique, small, fast in joins).

    But the clustered index should be on the column(s) most used in queries (WHERE statements).

    In the example I gave, CustomerCode is used a lot in searches (lookup text boxes in the application) but not used in joins (for that we have the Id).

    But CustomerCode is not sequential and can generate a lot of fragmentation on the clustered index.

    Is it worth creating the cluster index on CustomerCode and have maintenance plan to rebuild/reorganize the indexes over creating the cluster on the Id (the only advantage is not having RID Lookups but Key Lookups - not much of an advantage...)?

    Thanks,

    Pedro



    If you need to work better, try working less...

  • opc.three (10/8/2012)


    How wide is that customerCode and is that an oft-used column to increase selectivity?

    Pardon my ignorance, not used to some technical terms in English, but what do you mean by "How wide is that customerCode"?

    Thanks,

    Pedro



    If you need to work better, try working less...

  • PiMané (10/8/2012)


    opc.three (10/8/2012)


    How wide is that customerCode and is that an oft-used column to increase selectivity?

    Pardon my ignorance, not used to some technical terms in English, but what do you mean by "How wide is that customerCode"?

    Thanks,

    Pedro

    Sorry, I could have said that in a better way. I meant to ask what is the data type (CHAR, VARCHAR, NCHAR, NVARCHAR) and if it is of variable-length, how wide is the data on average?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • PiMané (10/8/2012)


    But CustomerCode is not sequential and can generate a lot of fragmentation on the clustered index.

    That might be OK. If you set your fill factor appropriately you can fend off fragmentation reasonably well and still control over-consumption of memory.

    Is it worth creating the cluster index on CustomerCode and have maintenance plan to rebuild/reorganize the indexes over creating the cluster on the Id (the only advantage is not having RID Lookups but Key Lookups - not much of an advantage...)?

    Again it will depend on the data access patterns. The only way to know is to test the various options with a representative workload on representative hardware. Another nice thing about clustered tables is you have some control over page splits whereas with heaps you have very little control over the creation of forwarding records.

    Regardless of which clustering key you choose, a good maintenance routine to keep fragmentation and page density at optimal levels will be critical in any scenario.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (10/8/2012)


    PiMané (10/8/2012)


    opc.three (10/8/2012)


    How wide is that customerCode and is that an oft-used column to increase selectivity?

    Pardon my ignorance, not used to some technical terms in English, but what do you mean by "How wide is that customerCode"?

    Thanks,

    Pedro

    Sorry, I could have said that in a better way. I meant to ask what is the data type (CHAR, VARCHAR, NCHAR, NVARCHAR) and if it is of variable-length, how wide is the data on average?

    🙂 the column is NVARCHAR(20) but mostly used always 10 chars = 2 letters + 8 numbers.

    But this is just a case study since other our clients can have only numbers or only chars without any pattern.

    That's why it's hard to determine if the column is sequential or not... probably on 80% is sequential, starts at 000001 and goes on... In this case the column would be a very good cluster index...

    When we don't know how the clustered index column(s) will be filled is there any FILLFACTOR value appropriate or every case is a different one?

    Thanks,

    Pedro



    If you need to work better, try working less...

  • Appropriate for all? No. Pick a spot, maybe 90, monitor fragmentation, and adjust up or down.

    I would be careful using a variable length column in a PK, especially if their values are not static. Test, test, test.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (10/9/2012)


    Appropriate for all? No. Pick a spot, maybe 90, monitor fragmentation, and adjust up or down.

    Will do that... the problem is that our SW is installed on several customers with different work methodology, we can have customers with 100 rows per table, 1.000 or 10.000 and even 1.000.000.

    But since the ones that complaint are the ones with large databases probably it's worth just looking into those cases..

    I would be careful using a variable length column in a PK, especially if their values are not static. Test, test, test.

    Our PK are usually always INT IDENTITY or GUID, never a NVARCHAR.

    We only have indexes over columns with NVARCHAR but not PKs. All lookups, FKs, are made usings INTs or GUIDs

    Thanks,

    Pedro



    If you need to work better, try working less...

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply