Including additional columns in a unique clustered index

  • I am creating an indexed view from a business unit table. The goal of the view is to pre-filter the unit list to only those supported by our division. The business hierarchy goes from unit to district to region, with unit # being unique on the table.

    However, we rarely ever use unit in our applications and instead focus on the district number (as per business requirements.) One of the most common queries performed is to get a distinct list of districts based upon region selections. Only occasionally would the unit # be included too.

    Should the region and district columns be included in the unique clustered index of the view? Or should they be limited to a separate nonclustered index? I'm leaning toward the latter, but if anyone has a good reason to include them in the clustered index, I'd love to hear it. 🙂

  • Hmm. In a normalized design, I wouldn't expect the region to be in the unit table, only the district.

    Instead, I'd expect the distinct list of districts w/i a region to come from a districts table. So you don't have a separate districts/district-level table?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Good observation, however, the table design was not mine and probably existed before I was even hired. It is now part of our data warehouse and is not likely to be changed anytime soon (if ever.)

    Please just accept that the table is what it is, warts and all.

  • Ok.

    At any rate, typically you'll want to cluster the table on the most common queries that can't be easily satisified by a nonclustered index you need anyway.

    For example, if (nearly) every query specifies district, you should probably cluster the table on district first. Add the region if needed to make the clus key unique.

    You might still want a nonclus index on region and district, to speed up the specific query you mentioned. But you don't want to cluster the whole table on region if that's not typically specified for most queries.

    Btw, I'm still not clear on where "division" fits into this, so I've gone with more generic info.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • dembones79 (3/4/2013)


    Should the region and district columns be included in the unique clustered index of the view?

    Bear in mind that a clustered index implicitly includes all the columns in the table/view. For the key I'd specify the minimum of columns, just what is the unique portion. The rest of the columns are in the index too.

    If it were a nonclustered index I'd still put just the unqiue columns in the key, the others would be include columns.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (3/4/2013)


    dembones79 (3/4/2013)


    Should the region and district columns be included in the unique clustered index of the view?

    Bear in mind that a clustered index implicitly includes all the columns in the table/view. For the key I'd specify the minimum of columns, just what is the unique portion. The rest of the columns are in the index too.

    If it were a nonclustered index I'd still put just the unqiue columns in the key, the others would be include columns.

    If you (almost) always select by multiple columns, and depending on the selectivity of the columns, you might get significant gains from including another column in the clus index. Don't follow too simplistic rules of "(always) one column per clus index" at the expense of the genuine performance of your dbs.

    Look at some of MS's system tables: some of their internal replication tables have 5 clus keys. I'm sure some would sorely disapprove of that, but it might be needed for best performance.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (3/4/2013)


    GilaMonster (3/4/2013)


    dembones79 (3/4/2013)


    Should the region and district columns be included in the unique clustered index of the view?

    Bear in mind that a clustered index implicitly includes all the columns in the table/view. For the key I'd specify the minimum of columns, just what is the unique portion. The rest of the columns are in the index too.

    If it were a nonclustered index I'd still put just the unqiue columns in the key, the others would be include columns.

    If you (almost) always select by multiple columns, and depending on the selectivity of the columns, you might get significant gains from including another column in the clus index. Don't follow too simplistic rules of "(always) one column per clus index" at the expense of the genuine performance of your dbs.

    Adding another column to a unique index will allow duplicates to the columns that are supposed to be unique. That's not a performance issue, that's a data integrity issue.

    If defining a set of columns as unique, that index should have just the unique columns defined in it, otherwise the unique constraint is not doing the job of enforcing uniqueness across a set of columns.

    If you have a business design that says that email address column must be unique, then defining the unique index on email address, surname, status and last order date may be great for performance, but so much for the unique email address.

    Look at some of MS's system tables: some of their internal replication tables have 5 clus keys.

    Yeah, and many of them have horrific performance.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (3/4/2013)


    ScottPletcher (3/4/2013)


    GilaMonster (3/4/2013)


    dembones79 (3/4/2013)


    Should the region and district columns be included in the unique clustered index of the view?

    Bear in mind that a clustered index implicitly includes all the columns in the table/view. For the key I'd specify the minimum of columns, just what is the unique portion. The rest of the columns are in the index too.

    If it were a nonclustered index I'd still put just the unqiue columns in the key, the others would be include columns.

    If you (almost) always select by multiple columns, and depending on the selectivity of the columns, you might get significant gains from including another column in the clus index. Don't follow too simplistic rules of "(always) one column per clus index" at the expense of the genuine performance of your dbs.

    Adding another column to a unique index will allow duplicates to the columns that are supposed to be unique. That's not a performance issue, that's a data integrity issue.

    If defining a set of columns as unique, that index should have just the unique columns defined in it, otherwise the unique constraint is not doing the job of enforcing uniqueness across a set of columns.

    If you have a business design that says that email address column must be unique, then defining the unique index on email address, surname, status and last order date may be great for performance, but so much for the unique email address.

    Look at some of MS's system tables: some of their internal replication tables have 5 clus keys.

    Yeah, and many of them have horrific performance.

    If you don't look up by the unique column, but by a non-unique column first and a unique column only sometimes, don't make the clus index key only the unique column. That will be a severe performance issue.

    Again, rigid overly-simplistic rules such as those will give you a poorly performing database with excessive nonclustered indexes to cover all the queries the clus index can't because its keys were so poorly chosen.

    MS has engineers with vastly more expertise at scale than any of us.

    Sometimes multiple clus keys are needed on a table.

    And since the clus index is the most important part of good query performance, you should consider it carefully and select the proper key column(s) to match your environment, not some pre-determined, overly-simplistic rule.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (3/4/2013)


    GilaMonster (3/4/2013)


    ScottPletcher (3/4/2013)


    GilaMonster (3/4/2013)


    dembones79 (3/4/2013)


    Should the region and district columns be included in the unique clustered index of the view?

    Bear in mind that a clustered index implicitly includes all the columns in the table/view. For the key I'd specify the minimum of columns, just what is the unique portion. The rest of the columns are in the index too.

    If it were a nonclustered index I'd still put just the unqiue columns in the key, the others would be include columns.

    If you (almost) always select by multiple columns, and depending on the selectivity of the columns, you might get significant gains from including another column in the clus index. Don't follow too simplistic rules of "(always) one column per clus index" at the expense of the genuine performance of your dbs.

    Adding another column to a unique index will allow duplicates to the columns that are supposed to be unique. That's not a performance issue, that's a data integrity issue.

    If defining a set of columns as unique, that index should have just the unique columns defined in it, otherwise the unique constraint is not doing the job of enforcing uniqueness across a set of columns.

    If you have a business design that says that email address column must be unique, then defining the unique index on email address, surname, status and last order date may be great for performance, but so much for the unique email address.

    Look at some of MS's system tables: some of their internal replication tables have 5 clus keys.

    Yeah, and many of them have horrific performance.

    If you don't look up by the unique column, but by a non-unique column first and a unique column only sometimes, don't make the clus index key only the unique column. That will be a severe performance issue.

    Again, rigid overly-simplistic rules such as those will give you a poorly performing database with excessive nonclustered indexes to cover all the queries the clus index can't because its keys were so poorly chosen.

    MS has engineers with vastly more expertise at scale than any of us.

    Sometimes multiple clus keys are needed on a table.

    And since the clus index is the most important part of good query performance, you should consider it carefully and select the proper key column(s) to match your environment, not some pre-determined, overly-simplistic rule.

    I don't see Gail setting forth any overly-simplistic, predetermined rules for creating indexes either clustered or nonclustered. All I see is a basic recommendation about a unique index (clustered or nonclustered).

  • ScottPletcher (3/4/2013)


    Again, rigid overly-simplistic rules such as those will give you a poorly performing database with excessive nonclustered indexes to cover all the queries the clus index can't because its keys were so poorly chosen.

    So 'keep the keys for a unique index to the columns that need to be enforced unique' is a 'rigid overly-simplistic rule'?

    Maybe you want to read what I said again, that for a unique index (one that's designed for enforce uniqueness) you should keep the keys to the columns that need to be unique, otherwise you have a constraint that isn't constraining anything in a useful way.

    Oh, and since I've never once seen a system where all queries against a table query on a single column or set of columns, no matter how you select the clustered index you are going to need nonclustered indexes (not that I said anything about how to select the clustered index keys in this thread). Unless of course you have that idealistic and perfect system where each and every table is only ever filtered or joined on a single column or set of columns. Never seen one of those in reality though.

    MS has engineers with vastly more expertise at scale than any of us.

    True, the CAT team (Mark, Lubor, Cindy and the rest are awesome). Pity those skilled engineers didn't work on the replication tables. Or on Sharepoint. Or on several other examples of terrible database design that came out of MS.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Uniqueness should not be the primary consideration for a clus key. Uniqueness should be a factor, but it's not the most important factor. The proper clus key depends still depends more on how the table is used.

    For example, for a typical log table, there will be a datetime logged. It is not unique, since several INSERTs can easily occur within the 3 ms between datetime intervals.

    Lookup is 100% by datetime for typical log uses: what happened with <x> activity between 11:30AM and 12:30PM yesterday.

    Also, say you've chosen to add an identity column to the table; by definition, that column will be unique (whether constrained as such or not, under careful use).

    Even so, the clus key should be the non-unique datetime, not the unique identity.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • GilaMonster (3/4/2013)


    ScottPletcher (3/4/2013)


    Again, rigid overly-simplistic rules such as those will give you a poorly performing database with excessive nonclustered indexes to cover all the queries the clus index can't because its keys were so poorly chosen.

    So 'keep the keys for a unique index to the columns that need to be enforced unique' is a 'rigid overly-simplistic rule'?

    Maybe you want to read what I said again, that for a unique index (one that's designed for enforce uniqueness) you should keep the keys to the columns that need to be unique, otherwise you have a constraint that isn't constraining anything in a useful way.

    Oh, and since I've never once seen a system where all queries against a table query on a single column or set of columns, no matter how you select the clustered index you are going to need nonclustered indexes (not that I said anything about how to select the clustered index keys in this thread). Unless of course you have that idealistic and perfect system where each and every table is only ever filtered or joined on a single column or set of columns. Never seen one of those in reality though.

    MS has engineers with vastly more expertise at scale than any of us.

    True, the CAT team (Mark, Lubor, Cindy and the rest are awesome). Pity those skilled engineers didn't work on the replication tables. Or on Sharepoint. Or on several other examples of terrible database design that came out of MS.

    You may or may not need nonclus indexes on a table. I've got plenty of tables with no nonclus indexes. With the proper clus index, one often doesn't need the mass of nonclus "covering indexes" that duplicate most of the original table and slow down all modifications to the base table.

    The discussion was on the clustered index, not unique indexes in general. A clus index key needn't be unique on its own, and sometimes that's just fine. Again, it depends on the uses of that specific table.

    General rules are nice, but they're not a substitute for carefully and properly selecting the specific clus index best for each table.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • If your goal in indexing is to try to make the clustered index the index used by most of your queries, you are doing it backwards. Clustered indexes are larger. and your queries will perform better, in general, when they use smaller non-clustered indexes.

    Also, let's suppose you base your clustering key on what certain queries are querying. Now fast forward a couple of years and you see that the queries hitting your system are completely different. Are you going to change your clustering keys? No, you may drop or add new nonclustered indexes but you don't change the clustering keys because a smart person doesn't base their clustering keys on the queries. The smart person bases it on the data. This is the general practice, there are always exceptions.

    Poor database design and indexing schemes like you are advocating are the reasons why applications like SharePoint, SCOM, and BizTalk have wild requirements like MaxDOP = 1 ... to cover up for their horrendous design.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • GilaMonster (3/4/2013)


    So 'keep the keys for a unique index to the columns that need to be enforced unique' is a 'rigid overly-simplistic rule'?

    Bear in mind that this is for an indexed view. Uniqueness is already enforced on the table. I suppose the clustered index on the view does not necessarily need to be unique. Does that paint a slightly different picture for you?

  • ScottPletcher (3/4/2013)


    The discussion was on the clustered index, not unique indexes in general. A clus index key needn't be unique on its own, and sometimes that's just fine. Again, it depends on the uses of that specific table.

    The OP's question was about the unique clustered index created on an indexed view. That does have to be unique. From Books Online:

    Creating Indexed Views

    The first index created on a view must be a unique clustered index. After the unique clustered index has been created, you can create additional nonclustered indexes. The naming conventions for indexes on views are the same as for indexes on tables. The only difference is that the table name is replaced with a view name. For more information, see CREATE INDEX (Transact-SQL).

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply