Index scan hint

  • Hi.

    In my script, the execution plan chose index seek in one case, but I want to test it with scan without changing the index itself (because I have a feeling that in this case scan might be better).

    Is it possible with hint ? And how can I put it ?

    Thanks

  • Why? Seeks are faster than scans.

  • Lynn Pettis (7/22/2009)


    Why? Seeks are faster than scans.

    Heh... because "A developer must not guess... a developer must KNOW." 😉 It'll be a nice test.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SQL Guy (7/22/2009)


    Hi.

    In my script, the execution plan chose index seek in one case, but I want to test it with scan without changing the index itself (because I have a feeling that in this case scan might be better).

    Is it possible with hint ? And how can I put it ?

    Thanks

    I believe that the only index you can force to scan is the clustered index. You'd do that with a [font="Arial Black"]WITH(INDEX(0)) [/font]hint.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/22/2009)


    Lynn Pettis (7/22/2009)


    Why? Seeks are faster than scans.

    Heh... because "A developer must not guess... a developer must KNOW." 😉 It'll be a nice test.

    If it isn't true, there are several people on SSC that I've heard this from that I trust that may need to revisit this as well.

    Hmm, how would one go about setting up a test like this...

  • The only time a scan can be faster than a seek is when there are multiple seeks against the same table (in clause, multiple ORs, inner table of a nested loop join). In other cases the scan can only ever be as fast as a seek, and that's in the case where the seek predicate matches all the rows in the table.

    If you want to force a scan, put a function on the column that SQL's using for the seek. Add 0 if it's a number, add '' if it's a string. That's sufficient to disallow an index seek. Hint the index as well, because SQL may decide that it would rather scan (or seek) a different index from the one that you want to test.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Lynn Pettis (7/23/2009)


    Jeff Moden (7/22/2009)


    Lynn Pettis (7/22/2009)


    Why? Seeks are faster than scans.

    Heh... because "A developer must not guess... a developer must KNOW." 😉 It'll be a nice test.

    If it isn't true, there are several people on SSC that I've heard this from that I trust that may need to revisit this as well.

    Hmm, how would one go about setting up a test like this...

    Everyone must know and not guess 🙂

    Y'all should know better than to make such absolute statements! It depends!!

    There are at least four cases where a scan may be as fast or faster than a seek (over the same, or very similar, number of records):

    1. Where the table fits on a single page

    2. The 'seek index' has a fill factor so low that it contains more pages than the heap/clustered index

    3. The 'seek index' is very highly fragmented and the heap/clustered index is contiguous.

    4. The query would benefit from parallelism, but the index has to be scanned backward.

    Other points to consider:

    Heaps are scanned in allocation order using the IAMs. This can be faster than scanning an index in logical page order if there is enough fragmentation. Clustered indexes can also be scanned using IAMs if a table lock is used or the read uncommitted isolation level is in effect.

    Read-ahead will be more effective on physically contiguous data (especially on Enterprise/Developer/Trial Editions).

    A scan over uniform extents can be faster than a seek over pages allocated from mixed extents.

    Script below which demonstrates a scan outperforming a seek in terms of total execution time, physical reads, and logical reads.

    Results on my system:

    Seek:

    physical reads: 283

    logical reads: 11,197

    worker time (cpu): 300ms

    elapsed time: 3.5s

    Scan:

    physical reads: 133

    logical reads: 7,989

    worker time (cpu): 825ms

    elapsed time: 1.8s

    --

    -- Run in a 'real' database, not tempdb, for best effect

    -- ONLY run on a TEST system which you have EXCLUSIVE access to

    --

    ifOBJECT_ID(N'dbo.Invoice', N'U') IS NOT NULL

    drop table dbo.Invoice

    go

    --

    -- Test table

    --

    create table

    dbo.Invoice

    (

    invoice_id int identity primary key nonclustered,

    invoice_number char(10) not null,

    issue_date datetime not null,

    due_date datetime not null,

    supplier_code char(4) not null

    );

    go

    --

    -- Index to seek with

    --

    create nonclustered index nc1 on dbo.Invoice (invoice_number) with (fillfactor = 100);

    go

    --

    -- Add 1.5M rows of test data

    --

    insert

    dbo.Invoice with (tablockx)

    (

    invoice_number,

    issue_date,

    due_date,

    supplier_code

    )

    select

    top (1500000)

    invoice_number = LEFT(NEWID(), 10),

    issue_date = DATEADD(DAY, RAND() * -14 - 14, CURRENT_TIMESTAMP),

    due_date = DATEADD(DAY, RAND() * 14 + 14, CURRENT_TIMESTAMP),

    supplier_code = RIGHT(NEWID(), 4)

    from

    master.sys.all_columns A1,

    master.sys.all_columns A2,

    master.sys.all_columns A3;

    go

    --

    -- Fragment the non-clustered index, and split index pages

    --

    updatedbo.Invoice

    setinvoice_number = REVERSE(invoice_number)

    whereinvoice_id = any (select top (100) I2.invoice_id from dbo.Invoice I2)

    go

    --

    -- Show the fragmentation

    --

    select index_type_desc, avg_fragmentation_in_percent, avg_page_space_used_in_percent, page_count, record_count

    from sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.Invoice', N'U'), NULL, NULL, 'DETAILED')

    where index_level = 0;

    dbcc freesystemcache('SQL Plans')

    go

    checkpoint

    dbcc dropcleanbuffers

    go

    set statistics io on

    go

    -- Seek

    selectCOUNT_BIG(*)

    fromdbo.Invoice

    whereinvoice_number > ''

    go

    checkpoint

    dbcc dropcleanbuffers

    go

    -- Scan

    selectCOUNT_BIG(*)

    fromdbo.Invoice with (index(0))

    whereinvoice_number between SPACE(10) AND 'ZZZZZZZZZZ'

    go

    set statistics io off

    select text, execution_count, total_worker_time, total_physical_reads, total_logical_reads, total_elapsed_time

    from sys.dm_exec_query_stats qs

    cross apply sys.dm_exec_sql_text (qs.sql_handle) t

    where t.text like '%dbo.Invoice%'

    and t.text not like '%sys.dm_exec_query_stats%'

    order by last_execution_time asc;

  • GilaMonster (7/23/2009)


    If you want to force a scan, put a function on the column that SQL's using for the seek. Add 0 if it's a number, add '' if it's a string. That's sufficient to disallow an index seek. Hint the index as well, because SQL may decide that it would rather scan (or seek) a different index from the one that you want to test.

    Putting a function on the column will almost always prevent an index seek.

    Simply adding a zero or empty string still results in a seek if the resulting expression can be constant folded*.

    * edit: Oh, I see - you meant add the zero or empty string to the column name not the value to compare to! Yeah, that works 🙂

  • Paul White (7/24/2009)


    There are at least four cases where a scan may be as fast or faster than a seek (over the same, or very similar, number of records):

    1. Where the table fits on a single page

    2. The 'seek index' has a fill factor so low that it contains more pages than the heap/clustered index

    3. The 'seek index' is very highly fragmented and the heap/clustered index is contiguous.

    4. The query would benefit from parallelism, but the index has to be scanned backward.

    Well, I was specifically talking about a seek on a particular index vs a scan on the same index, not a seek on a NC index vs a scan on a cluster. It's easily possible for a scan of the cluster/heap to be faster than a seek on a nonclustered index.

    If, however, you're talking (as I interpreted the OP to be doing) about a seek on IndexX vs a scan on IndexX, then none of those cases apply (maybe the parallelism one, haven't checked)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks, Jeff, I used your hint WITH(INDEX(0)) and found that scan was slower in my particular query.

    When I initially put my post, I did not mean to insist that scan will outpeform seek, in 95% of all cases you are all right, seek is far better. I just wanted to test it. And my intention was to find out how to hint it.

  • Thanks for the feedback. I learned some stuff myself. Good thread! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Paul White (7/24/2009)


    Lynn Pettis (7/23/2009)


    Jeff Moden (7/22/2009)


    Lynn Pettis (7/22/2009)


    Why? Seeks are faster than scans.

    Heh... because "A developer must not guess... a developer must KNOW." 😉 It'll be a nice test.

    If it isn't true, there are several people on SSC that I've heard this from that I trust that may need to revisit this as well.

    Hmm, how would one go about setting up a test like this...

    Everyone must know and not guess 🙂

    When I ran your good code, Paul, I got just the opposite results overall...

    total_worker_time total_physical_reads total_logical_reads total_elapsed_time

    392055 106 3942 1256268 --Seek

    959229 1027 8001 3325338 --Scan

    Perhaps a good reason for that is my system did a table scan instead of an index scan.:-P

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/24/2009)


    When I ran your good code, Paul, I got just the opposite results overall...

    total_worker_time total_physical_reads total_logical_reads total_elapsed_time

    392055 106 3942 1256268 --Seek

    959229 1027 8001 3325338 --Scan

    Perhaps a good reason for that is my system did a table scan instead of an index scan.:-P

    It is a table scan I was after. It was a table scan that out-performed the seek in my original test results.

    It can be quite tricky to reproduce - though I found it easier in a 'real' database than in tempdb for some reason. You might need to fragment the nc index even more, maybe?

    Thanks for taking the time to run it though. I realize my point was slightly off-topic, but I couldn't resist the opportunity to discuss some of the subtleties. A fun thread.

    Paul

  • Paul White (7/25/2009)


    I realize my point was slightly off-topic, but I couldn't resist the opportunity to discuss some of the subtleties.

    Speaking of "off topic"... that's part of the reason why I don't ever want them to allow the OP to close a thread... the discussions that sometimes follow are worth as much as the original problem. Good tangent, Gail and Paul.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply