Index Seek not possible if you want every row of table, correct?

  • I have a table, it has a clustered index on customer_ID.

    I need to return every row of this table via a LEFT join as I want to lookup ALL the customers we have, then join to SALES.(In theory every customer should have a sale, but not in reality)

    Anyway, I just wanted to check my understanding, seems like an obvious answer but the more I think on it the more I wonder:

    If I want EVERY row from the table (customers) then... I will always have to do a table SCAN, yes?

    Since the clustered index is on customer_ID then doing SELECT customer_ID FROM Customers will result in an Index Scan.

    The ONLY WAY it would be a seek would be if I didn't want EVERY customer row, correct?

    Like if I said only customer_ID between 500 and 50000 or similar, right?

  • Here's a good explanation of scan vs seek:
    https://blog.sqlauthority.com/2009/08/24/sql-server-index-seek-vs-index-scan-diffefence-and-usage-a-simple-note/

    Alan H
    MCSE - Data Management and Analytics
    Senior SQL Server DBA

    Best way to ask a question: http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Not entirely.

    The query optimizer is completely dependent on the statistics on the table. So, let's say you turn off the automatic statistics maintenance when there is only one row in the table. Then you add 5000 (or 5 umpty-gazillion) rows to the table. The statistics still say there is only one row. You can get an index seek occurring against that data.

    If the statistics are up to date, then it's very likely that retrieving all the rows from a table will result in a scan. However, without seeing the query, the tables, the statistics, I can't say that it's the ONLY WAY. It's just the most likely way that kind of data retrieval will take place.

    Same thing goes if you put a WHERE clause on that includes a BETWEEN operation. What will the optimizer see in the statistics as between those two values? One row, you may get a seek even if there are 5000.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Ah good points on stats, I didn't consider how it could deviate based off that.

    However, the good news is that the stats appear to be up to date as the expected and actual rows match 1:1 which is good.

    That said, to summarize for my own understanding:
    Unless the table has bad stats and wrongly estimates the number of actual rows in the table- returning every row from a table would result in a scan (Index scan if returning the PK fields of a clustered index).

    Did I understand that right?

  • Maxer - Tuesday, September 5, 2017 12:07 PM

    Ah good points on stats, I didn't consider how it could deviate based off that.

    However, the good news is that the stats appear to be up to date as the expected and actual rows match 1:1 which is good.

    That said, to summarize for my own understanding:
    Unless the table has bad stats and wrongly estimates the number of actual rows in the table- returning every row from a table would result in a scan (Index scan if returning the PK fields of a clustered index).

    Did I understand that right?

    Yeah, pretty much. I'd generalize it a bit further and say that anything can cause the row estimates to be off could lead to seeks where scans should be & vice versa. However, sounds like you're dead on in your situation.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Maxer - Tuesday, September 5, 2017 8:28 AM

    I have a table, it has a clustered index on customer_ID.

    I need to return every row of this table via a LEFT join as I want to lookup ALL the customers we have, then join to SALES.(In theory every customer should have a sale, but not in reality)

    Anyway, I just wanted to check my understanding, seems like an obvious answer but the more I think on it the more I wonder:

    If I want EVERY row from the table (customers) then... I will always have to do a table SCAN, yes?

    Since the clustered index is on customer_ID then doing SELECT customer_ID FROM Customers will result in an Index Scan.

    The ONLY WAY it would be a seek would be if I didn't want EVERY customer row, correct?

    Like if I said only customer_ID between 500 and 50000 or similar, right?

    I have seen it where it does a SEEK to the first row and then does a range scan for the rest of the table.  That's the tricky part about SEEKs.  If you don't look at the properties, they can be quite misleading.  If things go really haywire, a SEEK can actually be much worse than a SCAN when every row has its own individual SEEK.  100,000 SEEKs require 100,000 traversals through the B-TREE with typically 1 logical page read per level in the tree where a scan simply starts at the leaf level and "whips" through it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Please if this helps too.

    https://www.youtube.com/watch?v=GSZPvF2u6WY

  • Arsh - Wednesday, September 6, 2017 7:28 AM

    Please if this helps too.

    https://www.youtube.com/watch?v=GSZPvF2u6WY

    Good link, Arsh.  Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If I want EVERY row from the table (customers) then... I will always have to do a table SCAN, yes?

    Not necessarily.  It depends at least on:
    1) what columns you had in the SELECT, i.e., if all those columns were available from a nonclus index (a covering index)
    2) what, if any, ORDER BY is on the query.

    If a nonclus index contained all the columns, and you didn't ORDER BY customer_ID, SQL would be extremely likely to use that nonclus index.  With an ORDER BY Customer_ID, it's less likely, but not impossible, likely depending on the row estimates (as noted above).

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply