Indexing strategy

  • Dear all,

    I have below query:

    SELECT apr.POC_ID AS poc_id,

    DATEADD( DAY, 0, DATEDIFF( DAY, 0, DATEADD(DAY, -DAY(DATEADD(MONTH, 1, mch.mach_counter_hist_date)),DATEADD(MONTH, 1, mch.mach_counter_hist_date))))

    AS end_of_month

    FROM dbo.machine_counter_history AS mch

    INNER JOIN dbo.audit_procedure_result AS apr

    ON apr.audit_procedure_result_id = mch.audit_procedure_result_id

    WHERE mch.market_code = @intMarketCode

    AND mch.modf_on BETWEEN @datSSISStartDate AND @datSSISEndDate

    In terms of index strategy, should I create one index for mch.market_code and another for mch.audit_procedure_result_id or it is better to have a index that is composed by the two columns?

    thanks.

  • Not technically an answer, but this article from one of the regular posters here covers the thinking behind the question you're asking well

    http://sqlinthewild.co.za/index.php/2016/09/13/what-is-a-sargable-predicate/

    there are also other related articles to help you understand the concepts behind index design on there

    Edit to add the blindingly obvious additional resource. And URL tags. Having a bad day here.

    http://www.sqlservercentral.com/stairway/72399/

    I'm a DBA.
    I'm not paid to solve problems. I'm paid to prevent them.

  • You'd need to experiment with both. Usually, but certainly not always, a compound index is better. This is because the higher degree of selectivity caused by the compound index makes the optimizer more likely to choose the compound index. Further, index joins, which is what you'd need to get using an index on each column, are fairly rare things in the optimizer.

    Depending on the selectivity and data distribution of the columns, you'd then have to experiment with which column goes first in the index, assuming a compound key. This is because one of the primary drivers of index use is the histogram in the statistics and this is only created for the first column. Then, as if that wasn't enough, you'll need to consider whether or not to make the index covering by using INCLUDE to add any columns from the SELECT list that are not included with the index key in order to, if it hurts, avoid a key lookup operation.

    Ain't query tuning fun?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The single most important performance factor for most tables is how they are clustered. You may add non-clustered indexes, typically covering indexes, but the best clustering is the big key to performance.

    Based on the general approach of the query above, I strongly suspect that the dbo.machine_counter_history should be clustered first on modf_on. To satisfy that query specifically, you probably want to cluster on ( modf_on, market_code ). That may not be ideal overall but I suspect it will be workable and will probably help other queries against the table as well.

    Finally, think beyond just a single query when tuning, particularly for initial tuning. Use all data available about table/index usage -- including at least index missing stats, index usage stats and index operational stats -- to get an overview of all activity and tune for all activity at once when possible.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • I think you overstate the importance of the clustered index, especially in this case. Clustered index doesn't really matter too much here. The one you suggested will be mediocre for the query provided. If market_code is quite selective then potentially we will get a huge range scan on modf_on column.

    I also would be very careful with modf_on column, we don't know if it's value can change after the record is inserted or not.

    The best index for the query provided most likely would be with key on (market_code, modf_on) and INCLUDE (audit_procedure_result_id, mach_counter_hist_date).

    The index on mch.audit_procedure_result_id alone is 100% useless. Even in combination with other columns I doubt that server will use it. There are no filter on audit_procedure_result table so it's highly unlikely that server will start the query execution from this table and then will perform nested loops join to mch. And this is basically the only scenario when index on mch.audit_procedure_result_id might be useful.


    Alex Suprun

  • river1 (9/16/2016)


    In terms of index strategy, should I create one index for mch.market_code and another for mch.audit_procedure_result_id or it is better to have a index that is composed by the two columns?

    http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Alexander Suprun (9/16/2016)


    I think you overstate the importance of the clustered index, especially in this case. Clustered index doesn't really matter too much here. The one you suggested will be mediocre for the query provided. If market_code is quite selective then potentially we will get a huge range scan on modf_on column.

    I also would be very careful with modf_on column, we don't know if it's value can change after the record is inserted or not.

    The best index for the query provided most likely would be with key on (market_code, modf_on) and INCLUDE (audit_procedure_result_id, mach_counter_hist_date).

    The index on mch.audit_procedure_result_id alone is 100% useless. Even in combination with other columns I doubt that server will use it. There are no filter on audit_procedure_result table so it's highly unlikely that server will start the query execution from this table and then will perform nested loops join to mch. And this is basically the only scenario when index on mch.audit_procedure_result_id might be useful.

    I don't think I overstate it. And I think it's 99% certain that this table should not be clustered on identity, if the query given is anything close to the typical query against this table.

    Now, certainly in this case it could be better, perhaps much better, to cluster on ( market_code, modf_on ) instead. I likely wouldn't object to that. But, with no actual stats to go on, I based it on my general experience of such queries and the general clustering "rule" (goal) of having a naturally increasing key when possible.

    In my experience, the most recent time period -- say the last month or so -- is queried by far the most often. And all market codes are analyzed roughly equally. And I admit also that I assumed that a history table would have no, or only extremely rare, changes to modf_on date (I took that to be the "capture" date, as it were). If any of that is incorrect, the clustering indeed may need adjusted.

    Finally, keep I mind I'm trying to get the best overall performance, of all activity against the table, not just this one query. Because of that, I typically don't like to effectively create separate tables (covering indexes) for every query, because I don't think that gives the best overall performance, although, yes, each individual query runs well on its own (at least as long as there is sufficient buffer space for all the covering indexes).

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply