Indexes

  • Though for such small amounts of data at first I thought they might all perform a table scan as it decides there is no point using an index.

    Its this decision not to bother with the index due to the small data set that causes it not to use the index for query #4 after all.

    For query #1, the index scan is just a table scan in disguise.

  • Good question.

    I also answered "1" as I mistakenly thought that it wouldn't scan the index, however it is obvious that the "flag" column isn't in the index so it's going to table scan for that.

  • Not thinking about the extremely small stat set provided, I answered Query 1. If you look at the where clauses for all four queries you find that the proposed non-clustered index actually satisfies queries 2,3 and 4; so on a much larger result set those queries will use the index whereas query 1 will use a table scan.

    Questions like this one don't take into account how SQL Server works with statistics and indexes in chosing how to access data.

  • For query #1, the index scan is just a table scan in disguise.

    If the index were a clustered index, I would agree. But rememeber that you often see index scans instead of table scans where the index in question is a covering index for that particular query. The index is smaller than the table (in this case only by one column), and so is more efficient to scan than the table itself.

    Questions like this one don't take into account how SQL Server works with statistics and indexes in chosing how to access data.

    I'm glad we agree, Lynn.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Bob Hovious (9/7/2009)


    For query #1, the index scan is just a table scan in disguise.

    If the index were a clustered index, I would agree. But rememeber that you often see index scans instead of table scans where the index in question is a covering index for that particular query. The index is smaller than the table (in this case only by one column), and so is more efficient to scan than the table itself.

    Odd, because I tried it with 1,000,000 rows and the table scan was consistently, slightly lower for I/O than using the index. Though the difference was fairly negligible compared to the overall cost, 17.3 vs 19.8.

  • Hey Chris,

    Thinking about your results, the table itself is smaller than the index in this case. Why? Because the underlying table has no clustered index. It's a heap. So the index must have in addition to the data columns, a row identifier (RID) pointer for each row. The RID pointer is built from the file identifier, page number, and number of the row on the page, and is larger than the one-character [Flag] column.

    If you set up a base table that had a char(5) column instead of char(1), I would expect to see a different result. But I will test that and post back. But for most cases the general rule remains that a covering index, if available, will be scanned instead of the table.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Bob Hovious (9/7/2009)


    Thinking about your results, the table itself is smaller than the index in this case. Why?

    Any index (clustered or non-clustered) may also require extra pages for the non-leaf levels.

    In your example with 700 rows in total, the nonclustered index contains one page at index level 1. This is page 190 on file 1 in my case, and contains the following (just to show that it is in fact a non-leaf page):

    (the exact values reported will differ for you)

    Also, as written, the non-clustered index is created with a default fill factor - which may or may not be 0/100...

    (edit: to fix the image)

  • Thanks Paul, I was missing that. Thanks for fixing the image. Also, do you disagree with what I said about the size of the RID? I agree that should not be as significant as the extra pages in the index.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • I used the data from the original question, and got an index scan for all 4 queries (running sql 2000),

    Mal

  • Bob Hovious (9/7/2009)


    Also, do you disagree with what I said about the size of the RID? I agree that should not be as significant as the extra pages in the index.

    No I don't disagree: it's spot on. There is an extra 8 bytes for the RID in the index.

    Just to complicate matters though: the heap data might need an extra 14 bytes for row versioning - the index would not contain this overhead (if it was built offline - it may be added for indexes built or rebuilt online* - Enterprise only).

    There are other considerations, including how many data and index rows will fit within on a page, but let's no go there.

    Lastly, even a clustered index on an INT column might need 8 bytes of storage in some rows - if the index is not created as UNIQUE, and if duplicate key entries exist (four bytes for the INT and four more for the INT 'uniqueifier').

    Paul

    * - If the database is enabled for row-versioning.

    (edit : to add the *)

  • Andrew Watson (9/7/2009)


    I chose 1 as well.

    When faced with a problem like this, my brain automatically went into "performance problem" mode, and since these generally occur with big datasets, I selected the answer that would be most likely to happen in these circumstances.

    I would have to admit that for these data, 4 is the only right answer - I must learn to be a bit slower with my answers.

    I don't agree that 4 is the only right answer. 1 would be answer as well as Bob points out if more data were involved. Determining precise index behavior with such a small set of data is at best a guessing game. With a large data set it is possible to get a range seek from query 4 where as, 1 will always result in a scan, if not a table scan, an index scan, which is frequently as bad.

    I think this question & answer were a bit on shaky ground.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Many thanks, Paul.

    Grant, I liked this question, even though I got it wrong. 😉

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Bob Hovious (9/7/2009)


    Many thanks, Paul.

    Grant, I liked this question, even though I got it wrong. 😉

    You're most welcome.

    I haven't answered the question yet - but once again the discussion is better than the original question! A theme on SSC 🙂

  • I agree with dave.farmer.

    Depending in the number of rows and the statistics of the table, the optimizer can choose an index seek and rowid lookup for selecting only one row, for option 4).

    And, in initial scenario with only 6 rows, the best choice is allways a table scan (there is only one page in the table).

  • I don't belive it...

    The server can use index when the first part of the index is in the where clause.

    I try it and MS SQL server realy use index on query 2,3,4 and in case number 1 don't (because no index with ch in the first part). And I only use this seven records from the sample.

    Regards,

    Moha

Viewing 15 posts - 16 through 30 (of 46 total)

You must be logged in to reply to this topic. Login to reply