Unexpected partial index scan

  • All,

    I have a Java application which makes repeated SQL queries to a DISTRIBUTION table to read a subset of its 1M rows. The following prepared statement is run repeatedly until all required rows are read, with 100 parameters (?'s) at a time:

    SELECT FROM DISTRIBUTION WHERE MANUFACTURER_ID IN (?,?,?,?,?,...)

    I have attached the SQL Plan, the table DDL, and the non-clustered non-unique index DDL on the MANUFACTURER_ID column.

    The first query takes about 2/3 of a second to return and subsequent queries slowly speed up. Then, as much time goes on and apparently as the index on the MANUFACTURER_ID is cached into memory, each statement returns very quickly, up to 2000 queries per second if I do every row in the table.

    The (attached) SQL Plan from performance tracing shows an apparent partial index scan on the MANUFACTURER_ID column's index, so it scans from the low value in the parameter list to the high value 100 times for each query. And thus it is slow. (the values in the parameter list are randomly distributed)

    Is there a way to get SQL Server to do a seek for each parameter in the list? Based on the performance I am seeing I assume that will return the rows much more quickly when the number of rows queried is far less than the total in the table.

    I have updated statistics on the MANUFACTURER_ID column and the DISTRIBUTION_ID column, to no avail.

    Thanks in advance for any help. This is my first post: I have tried to be thorough and yet narrow down the problem to its core. But I may have missed something. 🙂

    Dave

  • It's hard to make suggestions without seeing the query and an actual execution plan. Can you post those? Also, in case people need to experiment it's useful to post the structure and some sample data as well.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Strange - I thought I had attached the Execution Plan and the table's DDL. The query is in the text of my original post, and I have now attached the Execution Plan, the DDL, and about 30 rows of sample data.

    Sorry about that.

  • That's a slightly odd looking plan. It doesn't have a SELECT statement or a query or any of the meta data about the plan in the XML. How did you generate it?

    I'd say you're getting that partial scan because of the > AND < operation. It has to look through to determine what fits that bill.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • This may be my inexperience at work. I got the Plan from the SQL Server Profiler output. I just right-clicked one of the "Showplan XML Statistics Profile" rows and selected "Extract Event Data..." to save it to disk.

    When I generate an Estimated Plan through the Management Studio, it creates a different plan. I have attached it in case you want to see it.

    I definitely agree that the in the Index Seek are evidence that a partial scan is occurring. The Management Studio Plan seems to do individual seeks, so I assume that is a good plan for extracting few rows.

    Thanks for your quick replies.

  • Yeah, estimated plans are seldom terribly helpful. It's basically the same plan though. It really looks like, based on what you provided, SQL Server is doing the right thing. An IN clause could have just generated a straight scan. Instead you're getting a good partial scan based on the optimizer being smart enough to just make you're IN clause into a between operation. Not perfect, I'll grant you, but better than it could have been.

    I'm going to have to play with Profiler some more. I figured surely the SELECT part of the plan would be the same.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The fact that I used a prepared statement may have been the cause of the missing SELECT in the Profiler's Plan. And yes, I am coming to the realization that maybe the partial scan isn't the real root of my problem.

    The fact that the keys I am sending across in bunches of 100 are randomly distributed is the real culprit. In a test, if I sort those keys before splitting them into 100's, the lookups are fast because the scan range is narrower.

    Thanks so much for your quick responses.

  • Not a problem.

    Another option, and it might not work for you, is to pass these values in a comma delimited format and then join against that single variable using a table of numbers to pivot the data, joining. It may be more performant. I'm pretty sure there's a sample somewhere around here, doing exactly that, written by Jeff Moden.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply