Indexes with Include

Question

Indexes with Include

Ray Laubert

SSCrazy

Points: 2910
More actions
June 26, 2009 at 8:07 am

#384857

I am performance tuning a database for a client. I have used several methods to identify tables that could use some indexes, but also want to keep the indexing workload to a minimum.
SQL and Profiler's recommendations include several new indexes on some of the tables that recommend include columns. My question is does SQL look at the order of the include columns when making an index selection.
Here is an example:
Index 1 recommendation
Index Key: [SCHOOL_YEAR]
Include Key: [COURSE_KEY], [STUDENT_KEY], [DISTRICT_KEY], [LOCATION_KEY]
Index 2 Recommendation
Index key: [SCHOOL_YEAR]
Include Key: [COURSE_KEY], [STUDENT_KEY], [PERIOD_KEY], [SECTION_CODE], [REQST_SEQUENCE], [ENROLL_KEY], [EFFECTIVE_DATE], [DISTRICT_KEY], [LOCATION_KEY], [DELIVERY_MODEL_KEY]
Index 3 Recommendation
Index Key: [SCHOOL_YEAR]
Include Key: [COURSE_KEY], [STUDENT_KEY], [PERIOD_KEY], [SECTION_CODE], [REQST_SEQUENCE], [ENROLL_KEY], [EFFECTIVE_DATE], [DISTRICT_KEY], [LOCATION_KEY]
Index 4 Recommendation
Index Key: [SCHOOL_YEAR]
Include Key: [COURSE_KEY], [STUDENT_KEY], [SECTION_CODE], [DISTRICT_KEY], [LOCATION_KEY]
Index 5 Recommendation
Index Key: [SCHOOL_YEAR]
Include Key: [COURSE_KEY], [STUDENT_KEY], [SECTION_CODE], [REQST_SEQUENCE], [ENROLL_KEY], [EFFECTIVE_DATE], [DISTRICT_KEY], [LOCATION_KEY], [DELIVERY_MODEL_KEY]
Index 6 Recommendation
Index Key: [SCHOOL_YEAR]
Include Key: [DISTRICT_KEY], [LOCATION_KEY]
All of these values are either DATETIME or BIGINT.
Goal is to keep the number of indexes to a minimum while providing the most benefit for reports and queries.
My thoughts are recommendations 2 & 5. But if the order or number of non used included columns is a factor in the index selection, then I would have to re-evaluate based on percentage of use/improvement and Index recommendation 6 would be the only one created.
I could not find the selection criteria on how SQL chooses an index with include columns other than it would use one the had ALL of the columns required to fulfill a query.
Unfortunately, I do not have the access nor the time left on the contract to test each of these scenarios out.
Thanks for your inputs
Raymond Laubert
Exceptional DBA of 2009 Finalist
MCT, MCDBA, MCITP:SQL 2005 Admin,
MCSE, OCP:10g

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply

Jack Corbett SSC Guru Points: 184393 More actions · Answer 1

My understanding of Included columns in an index is that they do not affect whether or not an index is selected. Index use is based on the key column(s) in the index from left to right (See the blog posts by Gail Shaw (GilaMonster here on SSC) here[/url] and here[/url]). Included columns allow you to create covering indexes without creating a huge composite key.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Ray Laubert SSCrazy Points: 2910 More actions · Answer 2

You are correct on the selecting an index as a 'candidate index' to be used. Only the index key values are used. However, after an candidate index is identified, the included columns are used to determine if it is a covering index or not. If it is determined that it is a covering index than that index will be used in the execution of the query.

The question then becomes does SQL treat the selection of covering indexes the same way it does the selection of candidate indexes. left to right and stopping at the first mismatch or does it just look for all the columns required before determining if it is a covering index.

Raymond Laubert
Exceptional DBA of 2009 Finalist
MCT, MCDBA, MCITP:SQL 2005 Admin,
MCSE, OCP:10g

Jack Corbett SSC Guru Points: 184393 More actions · Answer 3

Good question, I guess I misunderstood exactly what you were asking the first time. One thing I think I'd do in this instance is run a query like this:

SELECT

DDMID.index_handle,

DDMID.database_id,

DDMID.[object_id],

DDMID.equality_columns,

DDMID.inequality_columns,

DDMID.included_columns,

DDMID.statement,

DDMIGS.unique_compiles,

DDMIGS.user_seeks,

DDMIGS.user_scans,

DDMIGS.last_user_seek,

DDMIGS.last_user_scan,

DDMIGS.avg_total_user_cost,

DDMIGS.avg_user_impact,

DDMIGS.system_seeks,

DDMIGS.system_scans,

DDMIGS.last_system_seek,

DDMIGS.last_system_scan,

DDMIGS.avg_total_system_cost,

DDMIGS.avg_system_impact

FROM

sys.dm_db_missing_index_groups AS DDMIG JOIN

sys.dm_db_missing_index_group_stats AS DDMIGS

ON DDMIG.index_group_handle = DDMIGS.group_handle JOIN

sys.dm_db_missing_index_details AS DDMID

ON DDMIG.index_handle = DDMID.index_handle

Where

DDMID.database_id = DB_ID('yourdbname')

ORDER BY

DDMIGS.user_seeks DESC,

DDMIGS.avg_user_impact desc

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 4

You really only need one of those indexes from what I can see. That would be the one that includes all the columns listed in the INCLUDED column lists. The actual index portion is identical in all 6 recommended indexes.

Grant Fritchey SSC Guru Points: 398694 More actions · Answer 5

I wanted to be sure of the answer before I posted it, but it seems that SQL Server will most frequently pick the index with the most columns. I tried disabling them, changing the order of columns, varying the number of columns included in the query, but it generally went for whichever index had the most columns in the include clause at the time the query was run.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

Ray Laubert (6/26/2009)
The question then becomes does SQL treat the selection of covering indexes the same way it does the selection of candidate indexes. left to right and stopping at the first mismatch or does it just look for all the columns required before determining if it is a covering index.

The order of include columns does not matter. Only the order of key columns matters and then only for seek predicates.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Ray Laubert SSCrazy Points: 2910 More actions · Answer 7

Thanks, I was pretty sure that the order and number of columns for the include columns didn't matter, but I wanted to get some second opinions.

Have a great weekend. Just found out that as of Tues, this assignment is done. So I guess I start vacation on the 1st instead of the 6th hehe.

Raymond Laubert
Exceptional DBA of 2009 Finalist
MCT, MCDBA, MCITP:SQL 2005 Admin,
MCSE, OCP:10g

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

Grant Fritchey (6/26/2009)
I wanted to be sure of the answer before I posted it, but it seems that SQL Server will most frequently pick the index with the most columns. I tried disabling them, changing the order of columns, varying the number of columns included in the query, but it generally went for whichever index had the most columns in the include clause at the time the query was run.

That's odd, I would have thought it would go for the narrower index because it would incur the fewest IOs.

Time for some experimentation and maybe a blog post?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Grant Fritchey SSC Guru Points: 398694 More actions · Answer 9

GilaMonster (6/26/2009)
Grant Fritchey (6/26/2009)
I wanted to be sure of the answer before I posted it, but it seems that SQL Server will most frequently pick the index with the most columns. I tried disabling them, changing the order of columns, varying the number of columns included in the query, but it generally went for whichever index had the most columns in the include clause at the time the query was run.
That's odd, I would have thought it would go for the narrower index because it would incur the fewest IOs.
Time for some experimentation and maybe a blog post?

Not a bad idea. I'd need to do more experiments before I would try to claim I understood what I saw.

Every time I ran the query, it picked the widest set of includes, regardless of the order in which the indexes were created. I didn't check reads or timings.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Paul White SSC Guru Points: 150467 More actions · Answer 10

Grant Fritchey (6/28/2009)
Every time I ran the query, it picked the widest set of includes, regardless of the order in which the indexes were created. I didn't check reads or timings.

I ran some tests on 2005 and 2008 with AdventureWorks and about the only constant was that if an exactly-covering index existed, it was chosen. (These were additional indexes I created - not the shipped ones).

Other than that, even with twenty indexes to choose from, with varying numbers of INCLUDEd columns, in different orders, no clear pattern emerged. I thought for a moment that indexes with the required column *last* in the creation order were preferred; but no.

Fascinating question, but I ran out of time with it.

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 11

I'll do some tests during the week and write up a blog post if I can come to any conclusions.

Actually, I'll blog even if I can't come to a conclusion.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Grant Fritchey SSC Guru Points: 398694 More actions · Answer 12

Paul White (6/28/2009)
Grant Fritchey (6/28/2009)
Every time I ran the query, it picked the widest set of includes, regardless of the order in which the indexes were created. I didn't check reads or timings.
I ran some tests on 2005 and 2008 with AdventureWorks and about the only constant was that if an exactly-covering index existed, it was chosen. (These were additional indexes I created - not the shipped ones).
Other than that, even with twenty indexes to choose from, with varying numbers of INCLUDEd columns, in different orders, no clear pattern emerged. I thought for a moment that indexes with the required column *last* in the creation order were preferred; but no.
Fascinating question, but I ran out of time with it.
Paul

Huh. Odd. I did the same thing. I had about five different indexes, all with the same key, but varying include columns, but duplicated, like the OP. Every single time, it chose the index with the most columns. Couldn't tell you why. I just didn't dig much past that yet.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 13

One would think that using the index with most columns would minimize the risk of having to bookmark and use key lookup later in the execution plan.

But on the other hand, SQL Server should be smart enough to see if the index with most columns does cover the query or not.

If the index covers the query, use the index with less columns but still cover.

If the index does not cover the query, then things get more complicated. Should the query engine then use the index with less columns (for faster processing) and make a key lookup at the clustered index?

What if there is no clustered index?

Well, you get the idea that there are many thing to consider. And for getting a Best Practice out of this, DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE must be used to create a steady baseline for testing.

Not something you would want to do in a production environment...

N 56°04'39.16"
E 12°55'05.25"

Grant Fritchey SSC Guru Points: 398694 More actions · Answer 14

Peso (6/29/2009)
One would think that using the index with most columns would minimize the risk of having to bookmark and use key lookup later in the execution plan.
But on the other hand, SQL Server should be smart enough to see if the index with most columns does cover the query or not.
If the index covers the query, use the index with less columns but still cover.
If the index does not cover the query, then things get more complicated. Should the query engine then use the index with less columns (for faster processing) and make a key lookup at the clustered index?
What if there is no clustered index?
Well, you get the idea that there are many thing to consider. And for getting a Best Practice out of this, DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE must be used to create a steady baseline for testing.
Not something you would want to do in a production environment...

I'm sorry if I caused confusion. The index was always covering, yes. It was just over-covered. It used a five column include when it could have used a one column include to meet the same needs.

You're right about the DROPCLEANBUFFERS and FREEPROCCACHE.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning