Indexing strategy

Question

Indexing strategy

river1

SSC-Insane

Points: 24302
More actions
September 16, 2016 at 8:39 am

#329883

Dear all,
I have below query:
SELECT apr.POC_ID AS poc_id,
DATEADD( DAY, 0, DATEDIFF( DAY, 0, DATEADD(DAY, -DAY(DATEADD(MONTH, 1, mch.mach_counter_hist_date)),DATEADD(MONTH, 1, mch.mach_counter_hist_date))))
AS end_of_month
FROM dbo.machine_counter_history AS mch
INNER JOIN dbo.audit_procedure_result AS apr
ON apr.audit_procedure_result_id = mch.audit_procedure_result_id
WHERE mch.market_code = @intMarketCode
AND mch.modf_on BETWEEN @datSSISStartDate AND @datSSISEndDate
In terms of index strategy, should I create one index for mch.market_code and another for mch.audit_procedure_result_id or it is better to have a index that is composed by the two columns?
thanks.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

andrew gothard SSChampion Points: 12301 More actions · Answer 1

Not technically an answer, but this article from one of the regular posters here covers the thinking behind the question you're asking well

http://sqlinthewild.co.za/index.php/2016/09/13/what-is-a-sargable-predicate/

there are also other related articles to help you understand the concepts behind index design on there

Edit to add the blindingly obvious additional resource. And URL tags. Having a bad day here.

http://www.sqlservercentral.com/stairway/72399/

I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.

Grant Fritchey SSC Guru Points: 398694 More actions · Answer 2

You'd need to experiment with both. Usually, but certainly not always, a compound index is better. This is because the higher degree of selectivity caused by the compound index makes the optimizer more likely to choose the compound index. Further, index joins, which is what you'd need to get using an index on each column, are fairly rare things in the optimizer.

Depending on the selectivity and data distribution of the columns, you'd then have to experiment with which column goes first in the index, assuming a compound key. This is because one of the primary drivers of index use is the histogram in the statistics and this is only created for the first column. Then, as if that wasn't enough, you'll need to consider whether or not to make the index covering by using INCLUDE to add any columns from the SELECT list that are not included with the index key in order to, if it hurts, avoid a key lookup operation.

Ain't query tuning fun?

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ScottPletcher SSC Guru Points: 100949 More actions · Answer 3

The single most important performance factor for most tables is how they are clustered. You may add non-clustered indexes, typically covering indexes, but the best clustering is the big key to performance.

Based on the general approach of the query above, I strongly suspect that the dbo.machine_counter_history should be clustered first on modf_on. To satisfy that query specifically, you probably want to cluster on ( modf_on, market_code ). That may not be ideal overall but I suspect it will be workable and will probably help other queries against the table as well.

Finally, think beyond just a single query when tuning, particularly for initial tuning. Use all data available about table/index usage -- including at least index missing stats, index usage stats and index operational stats -- to get an overview of all activity and tune for all activity at once when possible.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Alexander Suprun SSCertifiable Points: 6233 More actions · Answer 4

I think you overstate the importance of the clustered index, especially in this case. Clustered index doesn't really matter too much here. The one you suggested will be mediocre for the query provided. If market_code is quite selective then potentially we will get a huge range scan on modf_on column.

I also would be very careful with modf_on column, we don't know if it's value can change after the record is inserted or not.

The best index for the query provided most likely would be with key on (market_code, modf_on) and INCLUDE (audit_procedure_result_id, mach_counter_hist_date).

The index on mch.audit_procedure_result_id alone is 100% useless. Even in combination with other columns I doubt that server will use it. There are no filter on audit_procedure_result table so it's highly unlikely that server will start the query execution from this table and then will perform nested loops join to mch. And this is basically the only scenario when index on mch.audit_procedure_result_id might be useful.

Alex Suprun

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

river1 (9/16/2016)
In terms of index strategy, should I create one index for mch.market_code and another for mch.audit_procedure_result_id or it is better to have a index that is composed by the two columns?

http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ScottPletcher SSC Guru Points: 100949 More actions · Answer 6

Alexander Suprun (9/16/2016)
I think you overstate the importance of the clustered index, especially in this case. Clustered index doesn't really matter too much here. The one you suggested will be mediocre for the query provided. If market_code is quite selective then potentially we will get a huge range scan on modf_on column.
I also would be very careful with modf_on column, we don't know if it's value can change after the record is inserted or not.
The best index for the query provided most likely would be with key on (market_code, modf_on) and INCLUDE (audit_procedure_result_id, mach_counter_hist_date).
The index on mch.audit_procedure_result_id alone is 100% useless. Even in combination with other columns I doubt that server will use it. There are no filter on audit_procedure_result table so it's highly unlikely that server will start the query execution from this table and then will perform nested loops join to mch. And this is basically the only scenario when index on mch.audit_procedure_result_id might be useful.

I don't think I overstate it. And I think it's 99% certain that this table should not be clustered on identity, if the query given is anything close to the typical query against this table.

Now, certainly in this case it could be better, perhaps much better, to cluster on ( market_code, modf_on ) instead. I likely wouldn't object to that. But, with no actual stats to go on, I based it on my general experience of such queries and the general clustering "rule" (goal) of having a naturally increasing key when possible.

In my experience, the most recent time period -- say the last month or so -- is queried by far the most often. And all market codes are analyzed roughly equally. And I admit also that I assumed that a history table would have no, or only extremely rare, changes to modf_on date (I took that to be the "capture" date, as it were). If any of that is incorrect, the clustering indeed may need adjusted.

Finally, keep I mind I'm trying to get the best overall performance, of all activity against the table, not just this one query. Because of that, I typically don't like to effectively create separate tables (covering indexes) for every query, because I don't think that gives the best overall performance, although, yes, each individual query runs well on its own (at least as long as there is sufficient buffer space for all the covering indexes).

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.