Question about Clustering & Includes

  • I had always had the hazy impression that the clustered index on a table is inherently a 'part' of all the other indexes in the table, but recently sql server exec plan tool recommended an index on a field that 'includes' the clustered index.  Here's a simplified example of what I mean:

    create table _testphone
    (
        personguid varchar(36) not null        
    ,    phonetype tinyint not null
    ,    phonenumber varchar(25)
    )
    create clustered index idxc_testphone_persguid on _testphone (personguid) /* this one I had already) */
    create nonclustered index idxn_testphone_phonetype on _testphone (phonetype) include (personguid) /* this was the index recommended by exec plan tool  */
    ;

    The personguid is what you think it is, and phonetype is an integer between 1 and 10, standing for 'mobile', 'business', etc.

    Leaving aside the questionable wisdom of the clustered index being on a guid (this table is completely remade nightly, and changes very little during the day, and the perf for those changes is not critical, so ... all in all it's not as bad as many other 'guid' scenarios).

    My questions are ...

    1. Is that 'include' really necessary?  
    2. What specific type of query would benefit from this?
    3. Under what scenarios (generally) should I 'include' my clustered index in other nonclustered indexes?  
    4. If I included the clustered field in like 4 or 5 other indexes on a table (esp. if it's a guid) is it really going to slam my i/u/d performance? Assuming about a 300K row table, 60K guids each with 4-6 phone types.
    5. If the clustered index were able to be unique, would that obviate the 'need' for this 'includes'

    TIA!

  • bvaljalo-1000038 - Wednesday, April 5, 2017 5:12 PM

    I had always had the hazy impression that the clustered index on a table is inherently a 'part' of all the other indexes in the table, but recently sql server exec plan tool recommended an index on a field that 'includes' the clustered index.  Here's a simplified example of what I mean:

    create table _testphone
    (
        personguid varchar(36) not null        
    ,    phonetype tinyint not null
    ,    phonenumber varchar(25)
    )
    create clustered index idxc_testphone_persguid on _testphone (personguid) /* this one I had already) */
    create nonclustered index idxn_testphone_phonetype on _testphone (phonetype) include (personguid) /* this was the index recommended by exec plan tool  */
    ;

    The personguid is what you think it is, and phonetype is an integer between 1 and 10, standing for 'mobile', 'business', etc.

    Leaving aside the questionable wisdom of the clustered index being on a guid (this table is completely remade nightly, and changes very little during the day, and the perf for those changes is not critical, so ... all in all it's not as bad as many other 'guid' scenarios).

    My questions are ...

    1. Is that 'include' really necessary?  
    2. What specific type of query would benefit from this?
    3. Under what scenarios (generally) should I 'include' my clustered index in other nonclustered indexes?  
    4. If I included the clustered field in like 4 or 5 other indexes on a table (esp. if it's a guid) is it really going to slam my i/u/d performance? Assuming about a 300K row table, 60K guids each with 4-6 phone types.
    5. If the clustered index were able to be unique, would that obviate the 'need' for this 'includes'

    TIA!

    The issue is that the suggested indexes aren't always the best. The optimizer is an amazingly wonderful and capable piece of software that was built by human beings, so it has flaws.

    Yes, if a table has a clustered index, then the key for the clustered index (or the key  + uniquifier if it's a non-unique index) will be included in each non-clustered index as a mechanism for supporting the lookup back to the data.

    1) Completely unnecessary
    2) Nothing. Test it to validate what I'm saying, but you'd be just as well served by a non-clustered index on the suggested column.
    3) I wouldn't. Further, even if you do put it in the INCLUDE list, the engine ignores that since the column is already there. You can test this by looking at index sizes with & without the INCLUDE. Now, you can put the clustered key into the key of another index, and there are situations (no, no easily defined rule, just spots where this may make sense) where this helps performance. In that case, the key values are included in the storage of the index because they define the key.
    4) It actually won't because it's already there and as I said, the engine will ignore the INCLUDE
    5) Again, this is a suggested index. Don't take it at face value. Unique or non-unique, the key (or key + uniquifier, which is the key), will be stored with the non-clustered index. This is already happening on your system. It's one of the reasons why people do shy away from really wide data types like GUID for the clustered key. However, that doesn't freak me out like it does others. If that column is the most used direct path to the data, then make it the clustered key. You can deal with index size, fragmentation and the rest as secondary effects. The principal is to get to the data as fast as possible.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • 1) Technically, no, since clus key column(s) are automatically included in all non-clus indexes. But I like to explicitly code it when it's needed to make this index a covering index for a query (not including any lookup back to the clus index itself). That way, if the clus index ever changes, I know where the clus key(s) are still needed in non-clus indexes. Edit: I don't explicitly code it if it's not directly needed for a query.
    2) Primarily one where only the columns in that specific non-clus index were needed for the query, i.e., it's a "covering index". Or, much more rarely, when the non-clus index allowed a (very) small number of rows to be filtered based on the columns in the index, and a lookup back to the main table was used to get other columns.
    3) See #1.
    4) Yes, a 36-char guid will bloat all the indexes. But all clus keys are always in all non-clus indexes no matter what you do.
    5) No.

    Rather than covert the guid to varchar, you could leave it binary ("uniqueidentifier").  That would reduce the size back to 16 bytes, less than half.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Grant Fritchey - Thursday, April 6, 2017 12:07 AM

    1) Completely unnecessary

    I strongly disagree. It;s future-proofing code and designs. If the clustered index key is needed in the include, and someone later goes and changes the clustered index, then suddenly queries go slower without obvious reason.

    If it's needed, specify it. SQL won't add it twice, so there's no overhead from explicitly specifying something that's implicitly there.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster - Thursday, April 6, 2017 10:50 AM

    Grant Fritchey - Thursday, April 6, 2017 12:07 AM

    1) Completely unnecessary

    I strongly disagree. It;s future-proofing code and designs. If the clustered index key is needed in the include, and someone later goes and changes the clustered index, then suddenly queries go slower without obvious reason.

    If it's needed, specify it. SQL won't add it twice, so there's no overhead from explicitly specifying something that's implicitly there.

    Agree to disagree. I see lots and lots of changes to structure & code, but usually (not always) few changes to the clustered index. Feels like premature optimization.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply