Query question - clustered index internals

  • Hi all,

    I have a question I hope the community can help me with. I have a query that is running against a legacy system. I think I know what is happening internally, but am looking to confirm that. The query is the following:

    DECLARE @Stuff varchar(50)

    SELECT max(Date) AS CloseDate

    FROM TableName with (nolock)

    WHERE Product LIKE @Stuff + '%'

    The query is running against a fairly large table (250 million rows). When a user passes in a value for @Stuff that returns a date, the query runs subsecond. When a user passes in a value for @Stuff and there is no record to return for this particular value, it runs extremely slow. The clustered index (also the PK) is on the Date and Product fields (in that order). The execution plan indicates a clustered index scan for both runs.

    My therory as to why it runs fast when there is data is because since the clustered index is created on the date field first, it is able to return the first date it finds for the first Product returned by WHERE Product LIKE @Stuff + '%'. When there is no valid return for the value passed in by @Stuff, it has the scan the entire table only to return no data (and with 250 million records, this is why it runs very slow). Is this correct?

  • Sounds to me like parameter sniffing.

    http://sqlinthewild.co.za/index.php/2007/11/27/parameter-sniffing/[/url]

    http://sqlinthewild.co.za/index.php/2008/02/25/parameter-sniffing-pt-2/[/url]

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • CheeseheadDBA (5/29/2013)


    Hi all,

    I have a question I hope the community can help me with. I have a query that is running against a legacy system. I think I know what is happening internally, but am looking to confirm that. The query is the following:

    DECLARE @Stuff varchar(50)

    SELECT max(Date) AS CloseDate

    FROM TableName with (nolock)

    WHERE Product LIKE @Stuff + '%'

    The query is running against a fairly large table (250 million rows). When a user passes in a value for @Stuff that returns a date, the query runs subsecond. When a user passes in a value for @Stuff and there is no record to return for this particular value, it runs extremely slow. The clustered index (also the PK) is on the Date and Product fields (in that order). The execution plan indicates a clustered index scan for both runs.

    My therory as to why it runs fast when there is data is because since the clustered index is created on the date field first, it is able to return the first date it finds for the first Product returned by WHERE Product LIKE @Stuff + '%'. When there is no valid return for the value passed in by @Stuff, it has the scan the entire table only to return no data (and with 250 million records, this is why it runs very slow). Is this correct?

    It is scanning the entire table because it has to check every row to determine if it matches the criteria for the WHERE clause. If the clustered index were flipped, Product then Date, you would actually see a Clustered Index Seek based on the WHERE clause you provided.

    For a test, create a NONCLUSTERED INDEX using Product then Date and run that query for a product that exists then again for one that doesn't.

  • Thanks for the response, but I'm fairly certain parameter sniffing isn't the case here. I also explored that option as that was my first thought however if I free both the plan and data cache and run the slow version (no data returned) of the query it still runs extremely slow (after this ad hoc plan is cached) and if I pass in a parameter that returns data it returns quickly.

  • Thanks Lynn. I like that idea - however one question I have is wouldn't the use of 'like' in the query prevent the new nonclustered index from being utilized in this case?

  • CheeseheadDBA (5/29/2013)


    Thanks Lynn. I like that idea - however one question I have is wouldn't the use of 'like' in the query prevent the new nonclustered index from being utilized in this case?

    Your WHERE clause looks like this:

    WHERE Product LIKE @Stuff + '%'

    This will use an index.

    If your WHERE clause looked like this, it wouldn't:

    WHERE Product LIKE '%' + @Stuff + '%'

  • That makes sense - I will give that a shot. Thanks!

  • Lynn Pettis (5/29/2013)


    It is scanning the entire table because it has to check every row to determine if it matches the criteria for the WHERE clause. If the clustered index were flipped, Product then Date, you would actually see a Clustered Index Seek based on the WHERE clause you provided.

    For a test, create a NONCLUSTERED INDEX using Product then Date and run that query for a product that exists then again for one that doesn't.

    Thanks for the catch Lynn...my brain called and wants to come home again soon. Guess that is what I get for only reading most of the question before jumping to conclusions. 😛

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Lynn Pettis (5/29/2013)


    CheeseheadDBA (5/29/2013)


    Hi all,

    I have a question I hope the community can help me with. I have a query that is running against a legacy system. I think I know what is happening internally, but am looking to confirm that. The query is the following:

    DECLARE @Stuff varchar(50)

    SELECT max(Date) AS CloseDate

    FROM TableName with (nolock)

    WHERE Product LIKE @Stuff + '%'

    The query is running against a fairly large table (250 million rows). When a user passes in a value for @Stuff that returns a date, the query runs subsecond. When a user passes in a value for @Stuff and there is no record to return for this particular value, it runs extremely slow. The clustered index (also the PK) is on the Date and Product fields (in that order). The execution plan indicates a clustered index scan for both runs.

    My therory as to why it runs fast when there is data is because since the clustered index is created on the date field first, it is able to return the first date it finds for the first Product returned by WHERE Product LIKE @Stuff + '%'. When there is no valid return for the value passed in by @Stuff, it has the scan the entire table only to return no data (and with 250 million records, this is why it runs very slow). Is this correct?

    It is scanning the entire table because it has to check every row to determine if it matches the criteria for the WHERE clause. If the clustered index were flipped, Product then Date, you would actually see a Clustered Index Seek based on the WHERE clause you provided.

    For a test, create a NONCLUSTERED INDEX using Product then Date and run that query for a product that exists then again for one that doesn't.

    I was going to recommend this index as a test as well. If this is commonly run query or there are many queries that use product without date in the criteria it may be an index you'd want to keep. Assuming that there are other columns in the table this index would also be smaller than the clustered index and need fewer reads to satisfy the query as well.

  • I created the index, however unfortunately it won't help me in the long run. The reality is, is that most of the time the query does returns data - it is actually when a user incorrectly puts in a value that doesn't return data is when it is slow (one could argue that this is a training or data validation issue and not really a database issue, but for the sake of this thread and my own curiosity that's either here nor there). The query is in a stored procedure, and when I call this proc with a value for @Stuff that doesn't return data, it uses an index seek as Lynn correctly stated. When I run the proc with a value that does return data, it effectively uses this seek, although does take about a half second longer than when it does a clustered index scan. If I recompile the proc and run it with a value that does return data, it uses the clustered index scan as it originally did. However, unless I force a recompile each time, or add a hint to optimize it for a parameter that doesn't return data, almost always the plan that will be cached will be for a value that does return data, in otherwords a clustered index scan (which is good for almost all cases, just bad for when a user enters an invalid value). I think I have enough 'ammo' to argue my original point now with the developer, and I learned something along the way which is always a plus.

  • CheeseheadDBA (5/29/2013)


    I created the index, however unfortunately it won't help me in the long run. The reality is, is that most of the time the query does returns data - it is actually when a user incorrectly puts in a value that doesn't return data is when it is slow (one could argue that this is a training or data validation issue and not really a database issue, but for the sake of this thread and my own curiosity that's either here nor there). The query is in a stored procedure, and when I call this proc with a value for @Stuff that doesn't return data, it uses an index seek as Lynn correctly stated. When I run the proc with a value that does return data, it effectively uses this seek, although does take about a half second longer than when it does a clustered index scan. If I recompile the proc and run it with a value that does return data, it uses the clustered index scan as it originally did. However, unless I force a recompile each time, or add a hint to optimize it for a parameter that doesn't return data, almost always the plan that will be cached will be for a value that does return data, in otherwords a clustered index scan (which is good for almost all cases, just bad for when a user enters an invalid value). I think I have enough 'ammo' to argue my original point now with the developer, and I learned something along the way which is always a plus.

    Please understand that my suggestion was actually based on a shot in the dark. The best way to evaluate performance issues with queries is to actually evaluate the actual execution plans used by the queries to determine what is happening when it works quickly and when it works slowly.

    Also, I wonder if the query you posted is the actually the query being run in the stored procedure.

  • I completely understand that it was a shot in the dark - was just posting what my findings were. I appreciate the responses! In terms of the stored proc, the query is identical what is in the stored procedure.

  • Something seems amiss to me here. Is there other things going on in the stored procedure besides this one query?

  • No - there is nothing else going on inside the stored procedure. The stored procedure simply contains the query I have in my original post.

  • Your original suggestion works great when the parameter passed in returns no results - which was my original question that you've successfully answered. I was just providing more information for the sake of discssuion. If I recompile the stored procedure and run it with a parameter that returns data, it still uses the clustered index scan regardless of whether or not the nonclustered index is created and the clustered index scan actually performs better in this case. Perhaps this is because of the MAX function in the select?

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply