Clustered Indexes on Identity columns

  • Hello,

    I had a quick question for everyone...

    I often wondered why Clustered Indexes are usually placed on PK Identity(1,1) columns (1,2,3,4,5,6,7,8...) instead of being placed on columns that would impact performance better. A Clustered Index on a date range column would be much more efficient than a Clustered Index on a basic "ID" column. I have seen this time and time again. Usually NON_Clustered indexes are placed on date ranges instead of Clustered Indexes. For the most part, I presume that this is due to that fact that a clustered index must be on a unique column and most datetime columns are not unique. Most articles that I have read lead you to the direction of placing indexes on JOINS, WHERE CLAUSES, and sometimes ORDER BY. But Clustered indexes on a join would be so much faster than NON-Clustered. What is the best practice?

    Thanks for your time.

    Dave

  • Clustered indexes don't have to be unique, though it is recommended.

    Discussion on clustered indexes

    http://www.sqlservercentral.com/articles/Indexing/68563/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the reply.

    So, would you say that it makes more sense to have a clustered index on a column besides an ID column? Seems like that would be the way to go.

  • Did you read the article I referenced?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • @DaveDB:

    You're quite right. There's no genuinely good reason for the "default" clustering key to be an identity column. Many people just do it by rote, without thinking about it, with the view that it reduces fragmentation and page splits in the table. But, as you noted, if you have a naturally ascending key anyway, such as a datetime, even that reason is not applicable. [And, of course, tables can be rebuilt online now (with Enterprise), so this should be much less of a concern than it used to be.]

    But there can be a very steep price to pay for the non-thinking "always identity" route. They then have to create multitudes of nonclustered, covering indexes to get any decent performance on major queries. I suppose that approach works, if one has the extra time to watch for the needs and create all those otherwise unnecessary indexes, as well as the extra disk space and CPU time to maintain and process them.

    Therefore, best practice is to choose the correct clustering key for your specific table based on its actual needs and usage. Ignore simplistic "rules" and give serious thought to what is best here.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • DaveDB (9/4/2013)


    Thanks for the reply.

    So, would you say that it makes more sense to have a clustered index on a column besides an ID column? Seems like that would be the way to go.

    There is no one right answer for this.

    My own general approach is to put the clustered index on the most common access path to the data. Frequently, that might be an identity column. Other times, it's not. But, that said, you still want to take into account a number of factors. For example whatever the size of the key on your clustered index, that's going to be included on each and every non-clustered index you create, will seriously affect performance because a wider key means fewer entries per page (and the same thing on all the non-clustered indexes). Also, if you have a key that is not unique, you do need to take into account that SQL Server will be adding additional information to make the cluster unique, which again, adds to the size of the key, possibly affecting down stream indexes. It all comes back to the fact that the cluster defines the storage of the data, so you need that to be efficient.

    Over the years, I've come to the conclusion that with modern disks being as they are, fragmentation is not the problem it used to be. I worry more about ensuring that I've got a good access path to the data that's going to get used and that I've got VERY good statistics, again, to ensure that the indexes I have get used.

    I would never advocate for an identity only approach (well, I did when I was a lot younger, but not any more). Instead I've noticed that you're probably going to have an "identity more likely than not" approach on most systems just because it makes sense. Just don't fall into the "clustered indexes work best with ranges" trap. That one has been pretty thoroughly debunked.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (9/4/2013)

    Instead I've noticed that you're probably going to have an "identity more likely than not" approach on most systems just because it makes sense.

    I'd disagree even with that; I don't see many tables where an identity is the best clustering key -- although certainly there are times when it is: I'm not saying an identity should never be the clustering key.

    Even log/audit tables almost always work much better with a datetime as the clustering key vs an identity, since lookups are (virtually) always done by datetime on those types of table anyway.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (9/4/2013)


    Grant Fritchey (9/4/2013)

    Instead I've noticed that you're probably going to have an "identity more likely than not" approach on most systems just because it makes sense.

    I'd disagree even with that; I don't see many tables where an identity is the best clustering key -- although certainly there are times when it is: I'm not saying an identity should never be the clustering key.

    Even log/audit tables almost always work much better with a datetime as the clustering key vs an identity, since lookups are (virtually) always done by datetime on those types of table anyway.

    Really? How many weasel words would make that an OK sentence. We have "probably" "more" "likely" "most". More than enough places to go other directions as needed. You can disagree, but I think it's a pretty safe statement.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (9/4/2013)


    ScottPletcher (9/4/2013)


    Grant Fritchey (9/4/2013)

    Instead I've noticed that you're probably going to have an "identity more likely than not" approach on most systems just because it makes sense.

    I'd disagree even with that; I don't see many tables where an identity is the best clustering key -- although certainly there are times when it is: I'm not saying an identity should never be the clustering key.

    Even log/audit tables almost always work much better with a datetime as the clustering key vs an identity, since lookups are (virtually) always done by datetime on those types of table anyway.

    Really? How many weasel words would make that an OK sentence. We have "probably" "more" "likely" "most". More than enough places to go other directions as needed. You can disagree, but I think it's a pretty safe statement.

    I admit, this is my pet peeve in physical table design. Identity as clus key is ridiculously over-, and, thus improperly, used. So I get nervous about any wording that invites people to fall back on "always" making an identity the clus key. That's already tempting to many people, since they don't have to think about it, but without careful consideration it's just a very bad idea.

    Correcting the clustering key has helped me massively reduce I/O on many queries on many very large tables. It's also allowed me to remove thousands of indexes while gaining better performance everywhere. So I'm really loathe to do anything that in any way even mildly suggests that the myth of "by default, just use identity as the clus key" is reliable, given its really nasty consequences in the real world.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (9/4/2013)


    I admit, this is my pet peeve in physical table design. Identity as clus key is ridiculously over-, and, thus improperly, used. So I get nervous about any wording that invites people to fall back on "always" making an identity the clus key. That's already tempting to many people, since they don't have to think about it, but without careful consideration it's just a very bad idea.

    Correcting the clustering key has helped me massively reduce I/O on many queries on many very large tables. It's also allowed me to remove thousands of indexes while gaining better performance everywhere. So I'm really loathe to do anything that in any way even mildly suggests that the myth of "by default, just use identity as the clus key" is reliable, given its really nasty consequences in the real world.

    And I've tuned systems that didn't have a single identity column but instead had enormous, wide, natural keys that quite literally destroyed performance. But, I would not say I would disagree with a statement that said "Designing with natural keys is generally a safer bet for most systems because you need to enforce the business definitions that make data unique" because I wouldn't have to then worry about introducing a nice, tight, integer in place of a varchar(256) for the clustered key where it was necessary and proper.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (9/4/2013)


    ScottPletcher (9/4/2013)


    I admit, this is my pet peeve in physical table design. Identity as clus key is ridiculously over-, and, thus improperly, used. So I get nervous about any wording that invites people to fall back on "always" making an identity the clus key. That's already tempting to many people, since they don't have to think about it, but without careful consideration it's just a very bad idea.

    Correcting the clustering key has helped me massively reduce I/O on many queries on many very large tables. It's also allowed me to remove thousands of indexes while gaining better performance everywhere. So I'm really loathe to do anything that in any way even mildly suggests that the myth of "by default, just use identity as the clus key" is reliable, given its really nasty consequences in the real world.

    And I've tuned systems that didn't have a single identity column but instead had enormous, wide, natural keys that quite literally destroyed performance. But, I would not say I would disagree with a statement that said "Designing with natural keys is generally a safer bet for most systems because you need to enforce the business definitions that make data unique" because I wouldn't have to then worry about introducing a nice, tight, integer in place of a varchar(256) for the clustered key where it was necessary and proper.

    I guess I've been lucky. Even as long as I've been doing DBA work, I've never seen a SQL clustering key that included anything 256 bytes(!) long.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply