Indexes with Include

  • I am performance tuning a database for a client. I have used several methods to identify tables that could use some indexes, but also want to keep the indexing workload to a minimum.

    SQL and Profiler's recommendations include several new indexes on some of the tables that recommend include columns. My question is does SQL look at the order of the include columns when making an index selection.

    Here is an example:

    Index 1 recommendation

    Index Key: [SCHOOL_YEAR]

    Include Key: [COURSE_KEY], [STUDENT_KEY], [DISTRICT_KEY], [LOCATION_KEY]

    Index 2 Recommendation

    Index key: [SCHOOL_YEAR]

    Include Key: [COURSE_KEY], [STUDENT_KEY], [PERIOD_KEY], [SECTION_CODE], [REQST_SEQUENCE], [ENROLL_KEY], [EFFECTIVE_DATE], [DISTRICT_KEY], [LOCATION_KEY], [DELIVERY_MODEL_KEY]

    Index 3 Recommendation

    Index Key: [SCHOOL_YEAR]

    Include Key: [COURSE_KEY], [STUDENT_KEY], [PERIOD_KEY], [SECTION_CODE], [REQST_SEQUENCE], [ENROLL_KEY], [EFFECTIVE_DATE], [DISTRICT_KEY], [LOCATION_KEY]

    Index 4 Recommendation

    Index Key: [SCHOOL_YEAR]

    Include Key: [COURSE_KEY], [STUDENT_KEY], [SECTION_CODE], [DISTRICT_KEY], [LOCATION_KEY]

    Index 5 Recommendation

    Index Key: [SCHOOL_YEAR]

    Include Key: [COURSE_KEY], [STUDENT_KEY], [SECTION_CODE], [REQST_SEQUENCE], [ENROLL_KEY], [EFFECTIVE_DATE], [DISTRICT_KEY], [LOCATION_KEY], [DELIVERY_MODEL_KEY]

    Index 6 Recommendation

    Index Key: [SCHOOL_YEAR]

    Include Key: [DISTRICT_KEY], [LOCATION_KEY]

    All of these values are either DATETIME or BIGINT.

    Goal is to keep the number of indexes to a minimum while providing the most benefit for reports and queries.

    My thoughts are recommendations 2 & 5. But if the order or number of non used included columns is a factor in the index selection, then I would have to re-evaluate based on percentage of use/improvement and Index recommendation 6 would be the only one created.

    I could not find the selection criteria on how SQL chooses an index with include columns other than it would use one the had ALL of the columns required to fulfill a query.

    Unfortunately, I do not have the access nor the time left on the contract to test each of these scenarios out.

    Thanks for your inputs

    Raymond Laubert
    Exceptional DBA of 2009 Finalist
    MCT, MCDBA, MCITP:SQL 2005 Admin,
    MCSE, OCP:10g

  • My understanding of Included columns in an index is that they do not affect whether or not an index is selected. Index use is based on the key column(s) in the index from left to right (See the blog posts by Gail Shaw (GilaMonster here on SSC) here[/url] and here[/url]). Included columns allow you to create covering indexes without creating a huge composite key.

  • You are correct on the selecting an index as a 'candidate index' to be used. Only the index key values are used. However, after an candidate index is identified, the included columns are used to determine if it is a covering index or not. If it is determined that it is a covering index than that index will be used in the execution of the query.

    The question then becomes does SQL treat the selection of covering indexes the same way it does the selection of candidate indexes. left to right and stopping at the first mismatch or does it just look for all the columns required before determining if it is a covering index.

    Raymond Laubert
    Exceptional DBA of 2009 Finalist
    MCT, MCDBA, MCITP:SQL 2005 Admin,
    MCSE, OCP:10g

  • Good question, I guess I misunderstood exactly what you were asking the first time. One thing I think I'd do in this instance is run a query like this:

    SELECT

    DDMID.index_handle,

    DDMID.database_id,

    DDMID.[object_id],

    DDMID.equality_columns,

    DDMID.inequality_columns,

    DDMID.included_columns,

    DDMID.statement,

    DDMIGS.unique_compiles,

    DDMIGS.user_seeks,

    DDMIGS.user_scans,

    DDMIGS.last_user_seek,

    DDMIGS.last_user_scan,

    DDMIGS.avg_total_user_cost,

    DDMIGS.avg_user_impact,

    DDMIGS.system_seeks,

    DDMIGS.system_scans,

    DDMIGS.last_system_seek,

    DDMIGS.last_system_scan,

    DDMIGS.avg_total_system_cost,

    DDMIGS.avg_system_impact

    FROM

    sys.dm_db_missing_index_groups AS DDMIG JOIN

    sys.dm_db_missing_index_group_stats AS DDMIGS

    ON DDMIG.index_group_handle = DDMIGS.group_handle JOIN

    sys.dm_db_missing_index_details AS DDMID

    ON DDMIG.index_handle = DDMID.index_handle

    Where

    DDMID.database_id = DB_ID('yourdbname')

    ORDER BY

    DDMIGS.user_seeks DESC,

    DDMIGS.avg_user_impact desc

  • You really only need one of those indexes from what I can see. That would be the one that includes all the columns listed in the INCLUDED column lists. The actual index portion is identical in all 6 recommended indexes.

  • I wanted to be sure of the answer before I posted it, but it seems that SQL Server will most frequently pick the index with the most columns. I tried disabling them, changing the order of columns, varying the number of columns included in the query, but it generally went for whichever index had the most columns in the include clause at the time the query was run.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Ray Laubert (6/26/2009)


    The question then becomes does SQL treat the selection of covering indexes the same way it does the selection of candidate indexes. left to right and stopping at the first mismatch or does it just look for all the columns required before determining if it is a covering index.

    The order of include columns does not matter. Only the order of key columns matters and then only for seek predicates.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks, I was pretty sure that the order and number of columns for the include columns didn't matter, but I wanted to get some second opinions.

    Have a great weekend. Just found out that as of Tues, this assignment is done. So I guess I start vacation on the 1st instead of the 6th hehe.

    Raymond Laubert
    Exceptional DBA of 2009 Finalist
    MCT, MCDBA, MCITP:SQL 2005 Admin,
    MCSE, OCP:10g

  • Grant Fritchey (6/26/2009)


    I wanted to be sure of the answer before I posted it, but it seems that SQL Server will most frequently pick the index with the most columns. I tried disabling them, changing the order of columns, varying the number of columns included in the query, but it generally went for whichever index had the most columns in the include clause at the time the query was run.

    That's odd, I would have thought it would go for the narrower index because it would incur the fewest IOs.

    Time for some experimentation and maybe a blog post?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (6/26/2009)


    Grant Fritchey (6/26/2009)


    I wanted to be sure of the answer before I posted it, but it seems that SQL Server will most frequently pick the index with the most columns. I tried disabling them, changing the order of columns, varying the number of columns included in the query, but it generally went for whichever index had the most columns in the include clause at the time the query was run.

    That's odd, I would have thought it would go for the narrower index because it would incur the fewest IOs.

    Time for some experimentation and maybe a blog post?

    Not a bad idea. I'd need to do more experiments before I would try to claim I understood what I saw.

    Every time I ran the query, it picked the widest set of includes, regardless of the order in which the indexes were created. I didn't check reads or timings.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (6/28/2009)


    Every time I ran the query, it picked the widest set of includes, regardless of the order in which the indexes were created. I didn't check reads or timings.

    I ran some tests on 2005 and 2008 with AdventureWorks and about the only constant was that if an exactly-covering index existed, it was chosen. (These were additional indexes I created - not the shipped ones).

    Other than that, even with twenty indexes to choose from, with varying numbers of INCLUDEd columns, in different orders, no clear pattern emerged. I thought for a moment that indexes with the required column *last* in the creation order were preferred; but no.

    Fascinating question, but I ran out of time with it.

    Paul

  • I'll do some tests during the week and write up a blog post if I can come to any conclusions.

    Actually, I'll blog even if I can't come to a conclusion.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Paul White (6/28/2009)


    Grant Fritchey (6/28/2009)


    Every time I ran the query, it picked the widest set of includes, regardless of the order in which the indexes were created. I didn't check reads or timings.

    I ran some tests on 2005 and 2008 with AdventureWorks and about the only constant was that if an exactly-covering index existed, it was chosen. (These were additional indexes I created - not the shipped ones).

    Other than that, even with twenty indexes to choose from, with varying numbers of INCLUDEd columns, in different orders, no clear pattern emerged. I thought for a moment that indexes with the required column *last* in the creation order were preferred; but no.

    Fascinating question, but I ran out of time with it.

    Paul

    Huh. Odd. I did the same thing. I had about five different indexes, all with the same key, but varying include columns, but duplicated, like the OP. Every single time, it chose the index with the most columns. Couldn't tell you why. I just didn't dig much past that yet.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • One would think that using the index with most columns would minimize the risk of having to bookmark and use key lookup later in the execution plan.

    But on the other hand, SQL Server should be smart enough to see if the index with most columns does cover the query or not.

    If the index covers the query, use the index with less columns but still cover.

    If the index does not cover the query, then things get more complicated. Should the query engine then use the index with less columns (for faster processing) and make a key lookup at the clustered index?

    What if there is no clustered index?

    Well, you get the idea that there are many thing to consider. And for getting a Best Practice out of this, DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE must be used to create a steady baseline for testing.

    Not something you would want to do in a production environment...


    N 56°04'39.16"
    E 12°55'05.25"

  • Peso (6/29/2009)


    One would think that using the index with most columns would minimize the risk of having to bookmark and use key lookup later in the execution plan.

    But on the other hand, SQL Server should be smart enough to see if the index with most columns does cover the query or not.

    If the index covers the query, use the index with less columns but still cover.

    If the index does not cover the query, then things get more complicated. Should the query engine then use the index with less columns (for faster processing) and make a key lookup at the clustered index?

    What if there is no clustered index?

    Well, you get the idea that there are many thing to consider. And for getting a Best Practice out of this, DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE must be used to create a steady baseline for testing.

    Not something you would want to do in a production environment...

    I'm sorry if I caused confusion. The index was always covering, yes. It was just over-covered. It used a five column include when it could have used a one column include to meet the same needs.

    You're right about the DROPCLEANBUFFERS and FREEPROCCACHE.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply