Weird Parameterized Query Behavior

Question

Weird Parameterized Query Behavior

david.castillo4

SSC Journeyman

Points: 77
More actions
May 30, 2012 at 3:43 pm

#261857

Hi all. New to the forum but not to SQL. Hope I can get some help here.
I don't like for people to waste my time and don't like wasting others, so i detailed my small problem in the doc file. you'll find its easy to follow so please don't hesitate to download. Basically I'm comparing the page reads and execution plans for the 2 query examples below. What you'll find for both query examples is the top queries (in Qry #1 and #2) both use and index seek and the bottom queries both use a table scan (the Execution plan confirms this in the doc file). However, the parameteriezed query in Query#2 says 36% and the bottom query in #2 says 64%. In Query#1 the top query says 94% and the bottom query in #1 says 6%. How is that possible when both top queries created identical execution plans at runtime??? The bottom queries also created idential execution plans at runtime.
Please help? Thanks for taking a peek
Query#1
set statistics io on;
DBCC FREEPROCCACHE;
select * from Client with(index(ncLastName)) where LastName = 'Diaz' option(maxdop 1)
select * from Client where LastName = 'Diaz' option(maxdop 1)
Query#2
set statistics io on;
DBCC FREEPROCCACHE;
declare @LastName varchar(100) set @LastName = 'Diaz'
select * from Client where LastName = @LastName option(maxdop 1)
select * from Client where LastName = 'Diaz' option(maxdop 1)

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150468 More actions · Answer 1

The execution plans look superficially the same, but they are quite different in the details.

The first query plan (using a literal string value) is based on the specific LastName value 'Diaz', so the seek operation is something like Seek Keys[1]: Prefix: LastName = Scalar Operator('Diaz') and the estimated row counts and costs are based on statistical information for that particular value.

The second query plan uses a variable reference. This plan will be cached and may be reused for any future value of the variable (not just 'Diaz'), so the estimated row counts and costs are based on the statistical distribution of all LastName values. The seek operation is something like Seek Keys[1]: Prefix: LastName = Scalar Operator([@LastName]).

SQL Server 2008 SP1 CU5 and later includes the 'parameter-embedding optimization'; if you add OPTION (RECOMPILE) to the query, it will be recompiled each time it is executed (and not cached) with the specific value in the variable used each time.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

david.castillo4 SSC Journeyman Points: 77 More actions · Answer 2

Thanks for the quick response...I really appreciate it. I think I understand your reply.

So for future queries where I am using declared variables I should include "recompile" as in below right??:

select * from Client where LastName = @LastName option(maxdop 1,recompile)

********************************************************

Also, if I follow correctly i should realize the percentage numbers displayed in my parameterized Query#2 example (where LastName = @LastName) are not true cost percentage numbers right??

********************************************************

This brings up another question if you don't mind. In my parameterized Query#2 example, why did it decide to use the ncLastName index anyway (when a table scan would of been more effecient)?? Do all Parameterized queries default to using an index or is this because the density of the majority of LastName values benefit mostly from the use of an index??

********************************************************

Thanks again!

Paul White SSC Guru Points: 150468 More actions · Answer 3

david.castillo4 (5/30/2012)
So for future queries where I am using declared variables I should include "recompile" as in below right??

If a good enough query plan depends sensitively on the parameter value, then yes, you might well do that. For queries that have a fairly even distribution, you might find you don't need it. There are other options as well, including OPTION (OPTIMIZE FOR (@LastName = 'Blah')) where 'Blah' is a value that works well for all parameters. This second option results in a cached plan, avoiding the overhead of recompiling on every execution.

Also, if I follow correctly i should realize the percentage numbers displayed in my parameterized Query#2 example (where LastName = @LastName) are not true cost percentage numbers right??

The cost of each statement is estimated by the optimizer separately. It expects a different number of rows for 'Diaz' versus the average case, so the costs are different. I try to encourage people not to compare batch percentage costs this way as it can be very misleading, and isn't very intuitive (i.e. the numbers don't mean what people generally assume them to mean).

This brings up another question if you don't mind. In my parameterized Query#2 example, why did it decide to use the ncLastName index anyway (when a table scan would of been more effecient)?? Do all Parameterized queries default to using an index or is this because the density of the majority of LastName values benefit mostly from the use of an index??

The optimizer considered both plan options, producing an estimated cost for each, primarily based on statistics describing the *average distribution* of LastName values. It just so happens that on this occasion, the seek + lookup costed lower than the cluster scan. A table scan would likely be more efficient when searching for the specific value 'Diaz', it's true, but remember this plan is optimized for the average case - it will be cached and can be reused in future with a different parameter value.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

david.castillo4 SSC Journeyman Points: 77 More actions · Answer 4

This was really bugging me so thanks so much for the explainations......I now understand parameterized query plans better.

Thanks!!!

david.castillo4 SSC Journeyman Points: 77 More actions · Answer 5

I came across something that got me to thinking about your earlier comment on work load percentage cost (relative to batch). Reason is, I often compare the execution plans (particularly page reads and work load %cost) when I'm trying to select which of 2 identical queries will be faster. I may be wrong but I think that as long as i hard code my where clause filters (instead of using delared variables) i can trust the "reletive to batch" percentage cost right?? Or am I wrong in saying that? I read somewhere that the "reletive to batch" percentage cost is determinded by statistics and other things which may give inaccurate "reletive to batch" percentage cost values. I'm also asking becase as I'm comparing 2 queries (both return identical result set but writtien differently...in attempt to select the faster one) i sometimes see one query with far fewer page reads having the worst percentage cost and vice versa. I would expect the query with the far fewer page reads to have the better "relative to batch" percentage cost. If i can't trust this "reletive to batch cost", then what can i use to truly determine which query is better when comparing 2 that return identical result sets.

Thanks

Paul White SSC Guru Points: 150468 More actions · Answer 6

david.castillo4 (6/6/2012)
I came across something that got me to thinking about your earlier comment on work load percentage cost (relative to batch). Reason is, I often compare the execution plans (particularly page reads and work load %cost) when I'm trying to select which of 2 identical queries will be faster. I may be wrong but I think that as long as i hard code my where clause filters (instead of using delared variables) i can trust the "reletive to batch" percentage cost right?? Or am I wrong in saying that? I read somewhere that the "reletive to batch" percentage cost is determinded by statistics and other things which may give inaccurate "reletive to batch" percentage cost values.

The 'relative to batch' percentage is a simple calculation based on the estimated cost of each statement in the batch. Each statement has an estimated cost, which you can see by examining the left-most node in the graphical plan for that statement. For a SELECT query, that node will be the green SELECT node. Hover over it with your mouse, or click on it and check the properties window. In either case, you will see an Estimated Subtree Cost. Since this node is the top of the tree, the subtree cost is the total estimated cost for that statement.

The important thing to realize is that the cost is *estimated* and based on the optimizer's *model* of reality. In general, the optimizer's model will not match the performance characteristics of your system well. This is by design: the model used happens to produce generally good results for a wide range of queries on a wide range of hardware. The estimated cost is used internally by the optimizer to choose between different physical plans for the *same* statement. The relative cost between two *different* statements is not designed to be compared directly. It's not just a question of statistics; the model just isn't intended to be used this way, which is why I wish SSMS did not show these batch percentages at all. That's not to say that these percentage costs will *never* happen to say anything useful, just that if they do, it's more down to luck than anything else.

I'm also asking becase as I'm comparing 2 queries (both return identical result set but writtien differently...in attempt to select the faster one) i sometimes see one query with far fewer page reads having the worst percentage cost and vice versa. I would expect the query with the far fewer page reads to have the better "relative to batch" percentage cost. If i can't trust this "reletive to batch cost", then what can i use to truly determine which query is better when comparing 2 that return identical result sets.

There are many performance metrics available, which ones you choose to focus on depend on your priorities. The main ones are:

Elapsed time

Processor usage (worker time)

Physical reads

Logical reads and writes

Memory use

The first four are available via system views like sys.dm_exec_query_stats. Memory grant is available by inspecting an actual execution plan or sys.dm_exec_query_memory_grants.

It generally makes sense to take account of more than one performance metric when evaluating queries, and to evaluate performance in different conditions (for example, testing when the data required is all in memory, and when it all has to be fetched in from disk).

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

david.castillo4 SSC Journeyman Points: 77 More actions · Answer 7

Again, Thanks so much for the tips on how to get accurate work load cost when comparing 2 queries. I really really appreciate it!!