Will The SQL Server Optimizer Recognize and Use More Than One Index on a Single Table?

  • (This post is a simplification of a more complicated indexing problem. To get directly to the crux of the matter, I've extrapolated from the complex case to the simple case.)

    Will the SQL Server Optimizer utilize the indexes in both of these situations effectively, assuming selectivity (cardinality) is acceptably granular on both columns:

    Situation 1:

    > Index1 = "Column1, Column2"

    Situation 2:

    > Index1 = "Column1"

    > Index2 = "Column2"

    If the Situation 2 indexes would both be recognized and effectively used by the Optimizer, it would help solve some indexing efficiency issues I have on several of my employer’s tables.

    LC

  • Gail (GilaMonster) requested that I post specifics so I will.

    The indexes in the referenced table are designed to provide the highest degree of speed for reporting purposes in an OLTP environment. (Many of you may suggest that the reporting environment be moved to a replicated server. I do not presently have that option.) As you will see, there are many almost-but-not-quite-redundant ways for a user to look at the underlying data via a report. This has created a (possible) problem of over-indexing.

    I am unable to copy and paste a screenshot of the indexes list from SQL Server Management Studio so I have simply listed the index names, below. The indexes are named descriptively as "NCX_TableName_ColumnName1_ColumnName2_ColumnName3" respectively. The search columns have been concatenated in the indexes from left to right, from highest cardinality to lowest cardinality:

    PK_ClickConversionTransactionJan

    NCX_ClickConversionTransactionJan_AffiliateReferenceId_CampaignId_PartnerId_PartnerLocalDate

    NCX_ClickConversionTransactionJan_CampaignId_AffiliateId_CreativeId_PartnerLocalDate

    NCX_ClickConversionTransactionJan_CampaignId_AffiliateId_PartnerLocalDate

    NCX_ClickConversionTransactionJan_CampaignId_AffiliateId_SubId_PartnerLocalDate

    NCX_ClickConversionTransactionJan_CampaignId_CreativeId_PartnerLocalDate

    NCX_ClickConversionTransactionJan_CampaignId_PartnerId

    NCX_ClickConversionTransactionJan_CampaignId_PartnerLocalDate

    NCX_ClickConversionTransactionJan_ClickId_CampaignId_PartnerId

    NCX_ClickConversionTransactionJan_ClickId_PartnerId

    NCX_ClickConversionTransactionJan_MerchantReferenceId_CampaignId_PartnerId_PartnerLocalDate

    NCX_ClickConversionTransactionJan_PartnerId_CampaignId_MerchantReferenceId_PartnerLocalDate

    LC

  • crainlee2 (2/4/2010)


    Will the SQL Server Optimizer utilize the indexes in both of these situations effectively, assuming selectivity (cardinality) is acceptably granular on both columns:

    Depends what kind of queries you're running.

    Do you have queries that filter on the combination of the two columns?

    Do you have indexes that filter on only one of the columns?

    While SQL is capable of using more than one index on a single table to evaluate a where clause, it will seldom do so (unless the where clause has predicates OR'd together)

    If your filter is something like Column1 = @Val1 AND Column2 = @Val2, SQL will chose the (single) most useful index and use that. It's capable of doing something called Index Intersection (seek on two indexes and intersect the results), but it's an uncommon plan choice

    I've give you some overnight reading:

    http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/ (also the follow-on that's linked at the end)

    http://www.sqlservercentral.com/articles/Indexing/68439/ (3 part series)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • crainlee2 (2/4/2010)


    The search columns have been concatenated in the indexes from left to right, from highest cardinality to lowest cardinality

    Ouch.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • There are queries that filter on 1, 2, 3, or up to 4 columns. The indexes represent the search criteria for an entire set of queries, some using "OR", some using "AND", as logical operators.

    The DATETIME column is always the rightmost column in the index(es) because searches are frequently performed by range rather than equality.

    All other search columns in the indexes are searched for by equality.

    Regarding your "ouch" comment: That was not particularly useful. Could you provide more information? It is a best practice to have the highest cardinality search column in a concatenated index as the leftmost column. So, I'm not sure what your remark is referring to.

    LC

  • crainlee2 (2/4/2010)


    Regarding your "ouch" comment: That was not particularly useful.

    I take it you haven't read the links I gave you, seeing as the first one was a discussion of index column order and cardinality/selectivity

    It is a best practice to have the highest cardinality search column in a concatenated index as the leftmost column. So, I'm not sure what your remark is referring to.

    I disagree entirely. There's far more to consider than just cardinality/selectivity when choosing the order of columns in an index. Yes, it should be considered, but it's not the thing that defines what column goes first in an index.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Actually, I read your link some time ago. Quoting from it:

    "So, what’s my rule for the order of columns in an index key? Put the most selective columns first, when all other considerations are equal."

    "If it turns out that there’s more than one column that can be first, based on the queries that run against the table, or if all queries do equality matches on two or more columns, then put the most selective one first so that SQL has more chance to know that the index is useful."

    That is exactly what has been implemented.

    LC

  • So there are no queries that filter on AffiliateId or PartnerLocalDate without also filtering on CampaignID? Nothing that filters on just SubId?

    I'm actually not sure what it is that you're asking? What info you want?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (2/4/2010)


    So there are no queries that filter on AffiliateId or PartnerLocalDate without also filtering on CampaignID? Nothing that filters on just SubId?

    I'm actually not sure what it is that you're asking? What info you want?

    No, there are no searches on AffiliateId or PartnerLocalDate without first filtering on CampaignId. Nothing that just filters on SubId. This implied rule is true for all of the indexes.

    However, there are queries that just filter on CampaignId. There are queries that just filter on AffliliateReferenceId. There are queries that just filter on ClickId. There are queries that just filter on MerchantReferencedId. There are queries that just filter on PartnerId. All of these columns are the leftmost column in their respective indexes. Additionally, there are queries that filter on the first 2 or 3 columns in one of the indexes. Note that in all cases, they are always the leftmost columns.[/b]

    The indexes I posted were created specifically in response to actual reporting queries and carefully constructed to match the WHERE and JOIN clause columns to columns in indexes with the search columns placed in order, in the indexes, from left to right, from highest cardinality to lowest cardinality.

    The problem I see is that, for instance, there are 6 indexes whose leftmost column is CampaignId. This seems inefficient because, for just a search on CampaignId, 5 of the 6 indexes are redundant. So, I'm looking for a more efficient way to implement the indexing for this table and the queries that reference it.

    In response to your second question, I will refer you to my first post. The question was pretty simple. You asked me in a private communication to elaborate with more details, so I posted what you asked for: all the details of a real world requirement.

    ___________

    I'll make it simpler by defining a theoretical table in a theoretical database named "Test":

    USE Test

    CREATE TABLE SomeTable (

    id INT IDENTITY(1,1) PRIMARY KEY,

    column1 INT,

    column2 INT,

    column3 INT

    )

    Then, I'll define a set of queries that can search on all of the table's non-identity columns in any combination:

    USE Test

    SELECT [someColumns] FROM SomeTable WHERE column1 = 1;

    SELECT [someColumns] FROM SomeTable WHERE column2 = 2;

    SELECT [someColumns] FROM SomeTable WHERE column3 = 3;

    SELECT [someColumns] FROM SomeTable WHERE column1 = 1 AND column2 = 2;

    SELECT [someColumns] FROM SomeTable WHERE column1 = 1 AND column3 = 3;

    SELECT [someColumns] FROM SomeTable WHERE column2 = 2 AND column3 = 3;

    SELECT [someColumns] FROM SomeTable WHERE column1 = 1 AND column2 = 2 AND column3 = 3;

    Then, I'll ask something similar to my original question:

    Would the most efficient and effective indexing solution be the following 3 indexes or would some other index scheme be more efficient, assuming all 3 columns have acceptably high and equal levels of cardinality:

    /****** Object: Index [NCX_column1] Script Date: 02/04/2010 16:22:41 ******/

    CREATE NONCLUSTERED INDEX [NCX_column1] ON [dbo].[SomeTable]

    (

    [column1] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    /****** Object: Index [NCX_column1] Script Date: 02/04/2010 16:22:41 ******/

    CREATE NONCLUSTERED INDEX [NCX_column2] ON [dbo].[SomeTable]

    (

    [column2] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    /****** Object: Index [NCX_column1] Script Date: 02/04/2010 16:22:41 ******/

    CREATE NONCLUSTERED INDEX [NCX_column3] ON [dbo].[SomeTable]

    (

    [column3] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    _______________

    LC

  • With the 3 indexes that you have defined there, the only queries that will perform optimally there will be the ones that filter on a single column. When using predicates with AND, SQL will very seldom (very, very, very seldom) use more than one index.

    SELECT [someColumns] FROM SomeTable WHERE column2 = 2 AND column3 = 3;

    This will use one of the indexes, either the one on Column2 or the one on column3, do a key lookup to the cluster to fetch the other columns then do a secondary filter on the column that wasn't in the index

    SELECT [someColumns] FROM SomeTable WHERE column1 = 1 AND column2 = 2 AND column3 = 3;

    This will use one of the indexes, either the one on Column1 or the one on Column2 or the one on column3, do a key lookup to the cluster to fetch the other columns then do a secondary filter on the two columns that weren't in the index.

    If I had to design indexes for that, I'd do something like this

    Index 1: (Column1, Column2, Column3)

    This satisfies the queries that filter on Col1 or on Col1, Col2 or on Col1, Col2, Col3.

    Index 2: (Column2, Column3)

    This satisfies the queries that filter Col2 or on Col2, Col3.

    Index 3: (Column3, Column1)

    This satisfies the queries that filter Col3 or on Col1, Col3.

    I have no more indexes than you (thought they are wider) and now all of those example queries can run with a single index seek, no need for secondary filters.

    This is why I dislike the 'most selective column first' rule. If I had put the columns in those indexes in a strict order of selectivity, then I would have needed 4 indexes, not 3. That's why I say 'put the selective column first when all other considerations are equal' In this case, all other considerations are not equal, I can satisfy more queries fully by changing the order of the columns in the index.

    As for your real reporting indexes, you need to decide where the tradeoff is. Do you add more indexes (slower inserts, longer maintenance, larger backups) and have all your reporting queries fast or do you remove some of those indexes, have some of the reporting queries running slightly slower but less overhead for the inserts? That only you can decide. It's why indexing is referred to as an art form.

    I can see one case in your listed indexes where, by changing the order of the columns, I could get rid of one index. As for the rest, experiment on a dev system, see how the queries respond by changing the column order. It's near-impossible to give you a useful answer as to the usefulness of indexes without seeing the queries that run against the tables.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (2/5/2010)

    As for your real reporting indexes, you need to decide where the tradeoff is. Do you add more indexes (slower inserts, longer maintenance, larger backups) and have all your reporting queries fast or do you remove some of those indexes, have some of the reporting queries running slightly slower but less overhead for the inserts? That only you can decide. It's why indexing is referred to as an art form.

    I can see one case in your listed indexes where, by changing the order of the columns, I could get rid of one index. As for the rest, experiment on a dev system, see how the queries respond by changing the column order. It's near-impossible to give you a useful answer as to the usefulness of indexes without seeing the queries that run against the tables.

    The real reporting environment that I provided as an example has much more read than write access so I'm willing to live with the overhead of so many indexes. If I can't achieve significantly better indexing, then so be it. Yes, there is a performance penalty when the records are inserted, but that is relatively infrequently, and the records are never updated. The reporting (read) demands are relentless and frequent, so that is why high read speed access is necessary.

    On a different but similar subject, there is another table in my employer's database that has ~60,000 rows in it. The table has 6 data columns and users accessing that table via a reporting GUI can dynamically create their search criteria using any combination of from 1 to 6 columns using any combination of AND and/or OR logical operators. Based on your post, I'm going to revisit the current index implementation for that table.

    Thanks for your detailed response, Gail.

    LC

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply