Column Order in an Index

  • taylor_benjamin (11/23/2010)


    One way to get SQL Server to consider an index it would otherwise exclude in preference to a table scan is to provide criteria for all columns of the index.

    In the case of your article, the optimizer was intelligent enough to know that the existing index was more optimized than a table scan. When this doesn't work, simply specify not restricting criteria in place of missing attributes.

    With your example, included LastName > 'A' in your where clause will meet the qualifications for the index.

    It's an old school trick, but it still works.

    Aiy yai yai...no one should ever do this, for a couple of different reasons. If the QO isn't choosing your index, it usually has a very good reason why not. In the example you gave, for instance, you're still going to wind up scanning all the rows, index or no. If the QO is choosing a table scan over an index scan, the table scan is probably going to be faster.

    Furthermore, in the rare case where the QO isn't selecting the optimal index, you don't want to use a "trick" like this. You just use a WITH index hint to instruct the QO which index it should be using.

  • cengland0 (11/23/2010)


    Doh, I just discovered the maximum key length is 900 bytes so combining two VARCHAR(500) columns into one index results in a warning:

    The maximum key length is 900 bytes. The ***indexname*** has maximum length of 1000 bytes. For some combinations of large values, the insert/update operation will fail.

    If you can't control the SSIS package from the vendor but you control the data afterwards, you could create two computed columns then create your indexes on the computed columns.

    http://msdn.microsoft.com/en-us/library/ms189292.aspx

    /* Anything is possible but is it worth it? */

  • taylor_benjamin (11/23/2010)


    In the case of your article, the optimizer was intelligent enough to know that the existing index was more optimized than a table scan. When this doesn't work, simply specify not restricting criteria in place of missing attributes.

    With your example, included LastName > 'A' in your where clause will meet the qualifications for the index.

    Maybe, but go and look at what's actually happened in that case.

    With the leading column searched on an inequality, SQL can't seek on both columns, only on the first. So it seeks on the first, and finds all the rows. Great, you've got yourself an index seek that is in essence an index scan (read of all the pages in an index). Looks good in the execution plan if you don't know what you're looking for, but it's no more efficient than the query would have been had you omitted that predicate.

    http://sqlinthewild.co.za/index.php/2009/03/05/when-is-a-seek-actually-a-scan/

    If SQL went for a table/clustered index scan over the index scan, it means the index is not covering and it thinks that the key lookups will be more expensive than a clustered index scan. A non-filtering predicate is unlikely to change its cost estimates. From the statistics it will be able to tell that LastName > 'A' has a selectivity of 1 (all rows qualify)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I have to admit that I am not impressed with the title. It is wrong. According to your article, the column order in a composite index is not important.

    From the article you state, "Let's now reverse the order of the WHERE clause in the select statement. If you now run the select statement shown below you will still see the same execution plan as shown in Fig 1."

    This article states that composite indexes work better when all columns in the composite index have an explicit value in the WHERE-clause. The composite index is less efficient when one or more of the columns in the index are omitted. Column order is irrelevant. Column presence is.

  • gosh (11/24/2010)


    I have to admit that I am not impressed with the title. It is wrong...Column order is irrelevant. Column presence is.

    Lol, and here I was criticizing the article because I didn't believe anyone actually believed this silliness. If you ever paid for a computer science course, I suggest you track down the professor and demand a full refund.

  • Gail,

    Excellent point. I should have mentioned that a missing propert for the first column of a composite index results in an index scan. Glad you did.

    The technique doesn't work as well when the missing column is the first item in the key. It will indeed do a complete index scan.

    The technique works better the deeper into the composite index the missing column value is found.

    Even with a complete index scan, given that no better index exist, this may be more desireable than a table scan.

    As a note, if the scenario discussed was a reality, I would probably create two indexes. One composite index without the last name, and an index on Last Name only.

    Then SQL Server can return the intersection of the two indexes before going to the base data if Last name is required. If not, it will simply use the appropriate index.

    Ben

  • gosh (11/24/2010)


    I have to admit that I am not impressed with the title. It is wrong. According to your article, the column order in a composite index is not important.

    Go back over the article and comments. The order in the index is important and takes some knowledge on which columns to include in each index.

    With an index on (Lastname, Firstname), as long as you are searching on Lastname and Firstname in your WHERE clause, the optimizer is smart enough to figure out the right order to search with to use the (Lastname, Firstname) index. However, only searching on Firstname will cause the optimizer to choose either a table/index scan or use a different index.

    Visualize the index as this:

    Lastname, Firstname

    Ang, Joe

    Angelou, Mitch

    Henry, Ben

    Selby, Josh

    Zenith, Sharp

    If you are searching for only Firstname, how would this index be helpful? You would have to scan the whole index. And if you have to scan the whole index, you might as well scan the table. But if you search on only LastName, the index is still useful, even though you're not looking for Firstname. And it's even better if you search for Lastname but want to return Firstname too because it's part of the index so there isn't a bookmark lookup on the table.

    One of the keys in designing an index is to know how the data will be searched. If you always search on Lastname and sometimes Lastname + Firstname, then the index (Lastname, Firstname) will serve you well. If you sometimes need to search on Firstname but still plan on the previous statement being true, then create an index on (Firstname) INCLUDE (Lastname).

    /* Anything is possible but is it worth it? */

  • The title of the article is, 'Column Order in an Index'. I expected proof that there was a performance difference between the order of columns in a composite index or a performance difference in the order of columns in the where-clause.

    Instead, the author tells us that the same execution plan is used when the order in the where-clause is reversed and doesn't touch upon the order within the composite index.

    However, he does tell us, that the number of parameters in the where-clause does make a difference when we have a composite clause. If the first column is only included, then the same execution plan is used, but not with the second.

    I still think that the title is misleading. Maybe something along the lines: 'Primary Column in a Composite Index is the most Important'.

  • gosh (11/24/2010)


    I expected proof that there was a performance difference between the order of columns in a composite index or a performance difference in the order of columns in the where-clause.

    You won't find that because the order of columns in a where clause is utterly irrelevant. It makes no difference what order you specify the where clause predicates.

    The order of columns in an index is very important, not so much for a single query, but for when you're trying to get as few indexes as possible to support as many queries as possible.

    Several links to my blog posts on this have already been posted here, so I won't post again. Go and read over them.

    I still think that the title is misleading. Maybe something along the lines: 'Primary Column in a Composite Index is the most Important'.

    It's not. The order of columns (all of them) is important. Maybe the leading column is the most important, but the order of the rest of the columns in the index is important too.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I too would like to thank you for the article. I recommend you follow the advice by Grant that I quoted.

    Grant Fritchey (11/23/2010)


    Thanks for the article Sarvesh. You may want to spend a little more time on the next one, just nail down why these things occur. As you showed, it's not simply the order in which columns are stored, rather, it's which column is on the front, the leading edge, of the index. Answer the question, why does the leading edge matter, and you'll have a much improved article.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • I agree with the "spend a little more time with the article" focus.When i first read the topic, i got interested, and it was an interesting read, but i do think it also was a little bit shallow and with some lack of research, even more if you consider you're talking about, i think, one of the more complex topics on database development.

    It's just constructive criticism.

    Regards

    DG

  • Nice attempt. Here in this article you should explain why SQL server does like this. I'm referring to the Index Statistitics. You can also mention in the article that if all of the columns are used as predicate, order doesn't matter more precisly it is about the leading column of the index matters not the order.

  • sandippani (5/11/2013)


    Nice attempt. Here in this article you should explain why SQL server does like this. I'm referring to the Index Statistitics.

    It's not due to statistics. It's due to the fact that you can only seek on a left-based subset of the index key.

    You can also mention in the article that if all of the columns are used as predicate, order doesn't matter more precisly it is about the leading column of the index matters not the order.

    The order still matters, especially if some of the predicates are inequalities.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I'm getting a little tired of all the jerkoffs who immediately jump in with comments like "You shouldn't be writing SQL code." Do you suppose anyone believes you were born with the ability to write perfect code? That's how people learn - try, make mistakes, try again, read, ask questions, post comments, listen to advice, tune your skills, swap ideas. If nobody did anything until they were perfect at it, nothing would ever get done, anywhere, by anybody.

    Get over yourselves - you've made mistakes and written garbage code too, we all have. The trick is to learn from it, and if you're prepared for possible admission to the human race, help others learn as well.

    Good manners are a nice plus, but if you can't post something that's at least helpful, go troll under someone else's bridge.

  • One thing I didn't see anyone mention is size. An index is likely (not necessarily, of course) smaller than the table, so even in the worst case of a full scan of the index, the amount of data passed around from the disk subsystem should be smaller. Seems like that should be worth something.

Viewing 15 posts - 31 through 45 (of 48 total)

You must be logged in to reply to this topic. Login to reply