T-SQL Variable vs String Equivalent

Question

Post reply

T-SQL Variable vs String Equivalent

sqlpadawan_1

SSCommitted

Points: 1571
More actions
April 8, 2013 at 7:47 am

#272788

Got a question with some coding standards. There are developers declaring a variable to a constant value and using that in multiple places in a stored proc rather that using the string equivalent.
For example:
DECLARE @yes VARCHAR(3) = 'Yes'
SELECT [pk_value]
,[field_value]
FROM [tempdb].[dbo].[table7]
WHERE [field_value] = @yes;
rather than:
SELECT [pk_value]
,[field_value]
FROM [tempdb].[dbo].[table7]
WHERE [field_value] = 'Yes';
I setup a quick and dirty example and ended up with the example below, the same execution plan was generated, but query using the variable was slower, using more CPU.
Is there any scenario out there where a variable would be faster than the string equivalent?
Thanks,
Kevin

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

j.miner SSCrazy Points: 2264 More actions · Answer 1

Hi SQL Padawan,

I coded up your example to explain the difference between estimated execution plans and actual I/O statistics.

Both of your statements use the idx_field_value in the query plan.

In my example below, I could not get the query analyzer to use the this index due to the amount of data in the table (2 records).

It performs a full table scan (FTS).

Here are some things to note.

1 - The plan with a constant will probably be re-used. This is due to the fact that we know what the value is.

On the other hand, this may lead to parameter sniffing.

2 - The plan with the variable will be recompiled every time. This is due to the fact it might change.

This is even more likely if the input parameter is set to a local variable before executing. And the local variable is used in the where clause.

Please see Grant Fritchey query plan e-book from simple talk.

I usually look at time and I/O when running a query. The below TSQL with turn these counters on in the query window.

-- Show time & i/o

SET STATISTICS TIME ON

SET STATISTICS IO ON

GO

I usually clear the buffers and plan cache in test so that stored plans do not skew the results.

-- Remove clean buffers & clear plan cache

CHECKPOINT

DBCC DROPCLEANBUFFERS

DBCC FREEPROCCACHE

GO

My simple execution using two records show that the non-parameter solution is 1 ms quicker than the one with a variable.

Both have the same page scans (logical and physical).

Please redo clearing the buffers and cache and post the results for I/O.

If they are almost the same, there is really not difference between the two constructs other than possibly removing parameter sniffing.

Sincerely

John

PS: I hope this post helps you out!

:w00t:

--

-- Sample code

--

-- Create temp table

create table tempdb.dbo.table7

(

[pk_value] int,

[field_value] varchar(10)

);

-- Create NC index

create nonclustered index idx_field_value on tempdb.dbo.table7 ([field_value]);

-- Insert two rows

insert into tempdb.dbo.table7 values

(1, 'Yes'),

(2, 'No');

-- Update the stats

update statistics tempdb.dbo.table7;

go

-- Show the data

select * from tempdb.dbo.table7

-- Show time & i/o

SET STATISTICS TIME ON

SET STATISTICS IO ON

GO

-- Remove clean buffers & clear plan cache

CHECKPOINT

DBCC DROPCLEANBUFFERS

DBCC FREEPROCCACHE

GO

-- Declare local variable

declare @yes varchar(3) = 'Yes'

-- First way

select

[pk_value]

,[field_value]

from tempdb.dbo.table7

where [field_value] = @yes;

/*

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 18 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

(1 row(s) affected)

Table 'table7'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 14 ms.

*/

-- Remove clean buffers & clear plan cache

CHECKPOINT

DBCC DROPCLEANBUFFERS

DBCC FREEPROCCACHE

GO

-- Second way

select

[pk_value]

,[field_value]

FROM tempdb.dbo.table7

WHERE [field_value] = 'Yes';

/*

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 22 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

(1 row(s) affected)

Table 'table7'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 13 ms.

*/

John Miner
Crafty DBA
www.craftydba.com

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

j.miner (4/8/2013)
2 - The plan with the variable will be recompiled every time. This is due to the fact it might change.
This is even more likely if the input parameter is set to a local variable before executing. And the local variable is used in the where clause.

Not true.

Variables do not cause recompilation (nor do constants or parameters). The plan will be cached and reused just as any other query, in fact, the caching will be exactly the same as the version of the query with the constant in it.

What causes recompilation:

The OPTION (RECOMPILE) hint (actually causes compile, not recompile)

The WITH RECOMPILE procedure option (actually causes compile, not recompile)

A change in statistics

A modification to an object that the query is dependent on.

Rebuilding an index on a table the query uses.

Clearing the plan cache (actually causes compile, not recompile)

See http://sqlinthewild.co.za/index.php/2011/08/16/compiles-and-recompiles/ for what causes compiles and recompiles.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

Basically the difference between the two comes down to parameter sniffing. At compile time the optimiser can see the value of the constant and can compile a query plan optimal for that value. This is called parameter sniffing. It is a Good Thing (most of the time).

When using a variable, the optimiser cannot sniff the value at compile time (the variable has no value at compile time) and hence the optimiser cannot use the value of the variable to generate a plan optimal for that value. It generates a more generic plan. In some cases this generic plan will be less optimal than the plan compiled with parameter sniffing.

The other side of the story however is that a generic plan may be better for reuse than an optimal plan, if the number of rows that the query affects can change radically between multiple executions. This is because in both cases the plans will be cached and reused.

Have a read through these, they're not as detailed as they could be, very old posts. If you want more information, google "Grant Frichey" "parameter sniffing", you'll find some of his articles on the subject.

http://sqlinthewild.co.za/index.php/2007/11/27/parameter-sniffing/

http://sqlinthewild.co.za/index.php/2008/02/25/parameter-sniffing-pt-2/

http://sqlinthewild.co.za/index.php/2008/05/22/parameter-sniffing-pt-3/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

JackIntheBox SSCrazy Points: 2022 More actions · Answer 4

A great article that could be added to this behavior is from Erland SommarsKog. Also as John mentioned, the above query may have been added by developers to avoid Parameter sniffing.

http://www.sommarskog.se/query-plan-mysteries.html

sqlpadawan_1 SSCommitted Points: 1571 More actions · Answer 5

Thanks all for the replies.

To clarify a couple of points. The variable is being used as a constant in the code. @yes will always equal 'Yes'. Rather than have a type-o, the developer will declare @yes at the top of the code and use it rather than typing 'Yes' multiple times in the code. Apply this logic to a more complex example, and hopefully you get the idea.

Maybe I'm misunderstanding param sniffing, but isn't the optimizer going to generate a plan and keep it until something changes (index rebuilt\reorg, stats update, table records added\deleted, etc...), and then generate a new plan based on the updated objects? Where the variable value never changes, param sniffing won't come into play here?

The query in question can run multiple times a second, I was simply looking at the performance of using @yes vs 'Yes', a quick and dirty adhoc looking at the cost per batch looks like the variable option is more expensive.

Here's the entire script to recreate the results:

/*

USE tempdb

GO

IF EXISTS (

SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'[dbo].[table7]')

AND type IN (N'U')

)

DROP TABLE [dbo].[table7]

GO

CREATE TABLE dbo.table7 (

pk_value INT IDENTITY(1, 1) PRIMARY KEY

,field_value VARCHAR(255)

);

GO

INSERT INTO table7 (field_value)

VALUES ('Yes');

GO 5000

INSERT INTO table7 (field_value)

VALUES ('No');

GO 15000

IF EXISTS (

SELECT *

FROM sys.indexes

WHERE object_id = OBJECT_ID(N'[dbo].[table7]')

AND NAME = N'IX_field_value'

)

DROP INDEX [IX_field_value] ON [dbo].[table7]

WITH (ONLINE = OFF)

GO

CREATE NONCLUSTERED INDEX [IX_field_value] ON [dbo].[table7] ([field_value] ASC)

WITH (

PAD_INDEX = OFF

,STATISTICS_NORECOMPUTE = OFF

,SORT_IN_TEMPDB = OFF

,IGNORE_DUP_KEY = OFF

,DROP_EXISTING = OFF

,ONLINE = OFF

,ALLOW_ROW_LOCKS = ON

,ALLOW_PAGE_LOCKS = ON

) ON [PRIMARY]

GO

*/

/*

USE tempdb

GO

--Include Actual Execution Plan

DECLARE @yes VARCHAR(3) = 'Yes'

SELECT [pk_value]

,[field_value]

FROM [tempdb].[dbo].[table7]

WHERE [field_value] = @yes;

GO

SELECT [pk_value]

,[field_value]

FROM [tempdb].[dbo].[table7]

WHERE [field_value] = 'Yes';

GO

*/

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

sqlpadawan_1 (4/8/2013)
To clarify a couple of points. The variable is being used as a constant in the code. @yes will always equal 'Yes'. Rather than have a type-o, the developer will declare @yes at the top of the code and use it rather than typing 'Yes' multiple times in the code. Apply this logic to a more complex example, and hopefully you get the idea.

That is going to harm performance. I suggest you head that one off as soon as possible, get them using string literals, not variables

Maybe I'm misunderstanding param sniffing, but isn't the optimizer going to generate a plan and keep it until something changes (index rebuilt\reorg, stats update, table records added\deleted, etc...), and then generate a new plan based on the updated objects? Where the variable value never changes, param sniffing won't come into play here?

Correct on the caching and reuse. Parameter sniffing won't come into effect because the optimiser can't sniff the value of variables, this will result in generic plans which are likely to perform worse than an optimal plan would. See the second of the links I posted.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

j.miner SSCrazy Points: 2264 More actions · Answer 7

Maybe I'm misunderstanding param sniffing, but isn't the optimizer going to generate a plan and keep it until something changes (index rebuilt\reorg, stats update, table records added\deleted, etc...), and then generate a new plan based on the updated objects? Where the variable value never changes, param sniffing won't come into play here?

If the value never changes in your where clause, you have some really boring code?

😛

Again, it all depends upon performance. That is why I suggested looking at run time and i/o. If you can live with a generic plan that uses variables, why not?

John Miner
Crafty DBA
www.craftydba.com

sqlpadawan_1 SSCommitted Points: 1571 More actions · Answer 8

Thank you all for taking time to reply, I appreciate all of the input.

sqlpadawan_1 SSCommitted Points: 1571 More actions · Answer 9

j.miner (4/8/2013)
Maybe I'm misunderstanding param sniffing, but isn't the optimizer going to generate a plan and keep it until something changes (index rebuilt\reorg, stats update, table records added\deleted, etc...), and then generate a new plan based on the updated objects? Where the variable value never changes, param sniffing won't come into play here?
If the value never changes in your where clause, you have some really boring code?
😛
Again, it all depends upon performance. That is why I suggested looking at run time and i/o. If you can live with a generic plan that uses variables, why not?

Isn't all code boring after it's been written and running in production?

Imagine a SELECT in a proc that returns active widgets, the WHERE is always going to be the same. So do you code your where clause 'WHERE Active = 'YES'' or WHERE Active = @yes? If I can consistently prove that the variable code uses more CPU and is slower, than in my mind, it's an open and shut case. What does the variable declaration buy you? I'm either lazy or efficient (probably lazy), but in my mind this just clouds the code.

I typically do look at IO and runtime (I <3 my profiler more than the query io \ runtime measures), but I also find it helpful to run both and look at the cost relative to the batch.

j.miner SSCrazy Points: 2264 More actions · Answer 10

Very true. However, life is shades of gray, never black & white.

Many of the applications that run on my boxes are vendor applications.

Some are vendors that do not even exist any more.

I have a limited staff (2) and resources to manage > 50 production servers.

Given that fact, I only worry about the ones that are taking a major time to run once or take a medium time but run many times.

In short, I kill the large elephants but a lion or two might still be out there.

But if they do not bother anyone, I do not worry about them.

Good luck in your endeavors!

John

John Miner
Crafty DBA
www.craftydba.com

Jeff Moden SSC Guru Points: 1003894 More actions · Answer 11

Heh... there are no shades of gray. Either the code is fast or slow. Whether you choose to work on it or not is a decision. 🙂

Anyway, I've seen it on my home machine where the literal is always faster than the variable. That's using 2K5 Developer's Edition (virtually identical to the Enterprise Edition) on a 32 bit box. I'm not there right now or I'd show you the runs from that.

On the 2K5 Enterprise Edition 64K Box at work, the two pieces of code take turns winning within milliseconds of each other on a million row table. Not that I used dump-variables to take the display out of the picture because I don't want to measure time to display. The execution plan on both is identical to the other including % of batch.

As a bit of a sidebar, using % of batch to figure out which piece of code is more efficient is a very bad thing to do. Even after an actual run, it's still an estimate and I've seen it be 100% incorrect. I even gave Grant Fritchey the example and he included it in his APress book on execution plans.

Trying to test such problems with only 2 rows won't get anyone anywhere. It's easy to make a million row test table for such things in seconds. Here's some code that does just such a thing for this problem.

-- DROP TABLE tempdb.dbo.Table7

SELECT TOP 1000000

PK_Value = IDENTITY(INT,1,1)

, Field_Value = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 0 THEN 'No' ELSE 'Yes' END

INTO tempdb.dbo.Table7

FROM master.sys.all_columns ac1

CROSS JOIN master.sys.all_columns ac2

;

ALTER TABLE tempdb.dbo.Table7

ADD PRIMARY KEY CLUSTERED (PK_Value)

;

CREATE INDEX IX_Table7_Field_Value

ON tempdb.dbo.Table7 (Field_Value)

;

Here's the code I used to test with. Like I said, I used variables to take the display time out of the picture. It's also worth noting that you should never use SET STATISTICS when testing User Defined Fuctions because it adds a huge amount of time to Scalar Functions in particular.

DECLARE @PK_Value INT

, @Field_Value CHAR(3)

;

SET STATISTICS TIME ON

;

DECLARE @yes VARCHAR(3)

SELECT @yes = 'Yes'

;

SELECT @PK_Value = PK_Value

, @Field_Value = Field_Value

FROM tempdb.dbo.Table7

WHERE Field_Value = @yes

;

SELECT @PK_Value = PK_Value

, @Field_Value = Field_Value

FROM tempdb.dbo.Table7

WHERE Field_Value = 'Yes'

;

GO

SET STATISTICS TIME OFF

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110207 More actions · Answer 12

On my desktop (2k8) both queries perform equally:

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 141 ms, elapsed time = 148 ms.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 156 ms, elapsed time = 148 ms.

Times may vary from run to run by couple of ms, but they are always in line with each other.

That's probably because the "generic" execution plan chosen for @yes version is identical to the "sniffing" plan.

Both are cached and reused.

So SQL Server runs effectively the same code against the same set of data - no difference to be expected.

But it's this particular case of the table structure and data distribution.

If I change the cardinality by using this"

, Field_Value = CASE WHEN ABS(CHECKSUM(NEWID()))%20 = 1 THEN 'Yes' ELSE 'No' END

then literal constant outperforms varuiable.

it's still < 5ms difference on my machine, but with total execution time around 20 ms it's noticable.

In reverse case (90% of 'YES' and 10% of 'NO') times are basically the same:

SQL Server Execution Times:

CPU time = 281 ms, elapsed time = 289 ms.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 297 ms, elapsed time = 287 ms.

....

SQL Server Execution Times:

CPU time = 281 ms, elapsed time = 297 ms.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 312 ms, elapsed time = 302 ms.

_____________
Code for TallyGenerator

Sergiy SSC Guru Points: 110207 More actions · Answer 13

sqlpadawan_1 (4/8/2013)
Imagine a SELECT in a proc that returns active widgets, the WHERE is always going to be the same. So do you code your where clause 'WHERE Active = 'YES'' or WHERE Active = @yes? If I can consistently prove that the variable code uses more CPU and is slower, than in my mind, it's an open and shut case. What does the variable declaration buy you? I'm either lazy or efficient (probably lazy), but in my mind this just clouds the code.

If you are lazy (good on you! 🙂 ) create a view ActiveWidgets with hardcoded literal - and use it in all relevant procedures.

Be careful not to put any calculations/conversions into that view, just row filter (and possibly limited columns).

Otherwise you're gonna create horrible performance issues.

_____________
Code for TallyGenerator

Sean Pearce SSCoach Points: 15999 More actions · Answer 14

j.miner (4/8/2013)
Maybe I'm misunderstanding param sniffing, but isn't the optimizer going to generate a plan and keep it until something changes (index rebuilt\reorg, stats update, table records added\deleted, etc...), and then generate a new plan based on the updated objects? Where the variable value never changes, param sniffing won't come into play here?

It's got nothing to do with having the wrong plan in cache, but everything to do with the estimate row counts.

When you use a literal, SQL Server will estimate the row count using statistics. When using a local variable, SQL Server has no choice but to do a thumb suck. The thumb suck is row count / 3 when there are more than 2 distinct values in the column and row count / 2 when there are 2 distinct values.

Consider this example:

I have a table with 100,000 rows. Let's say 90% of the rows contain the value 'Yes'. When using a literal the estimated number of rows will be 90,000. When using a variable the estimated number of rows is 33,333.3

The following code illustrates this:

CREATE TABLE #Test (ID INT, Value VARCHAR(3));

INSERT INTO #Test (ID)

SELECT TOP 100000

ROW_NUMBER() OVER (ORDER BY a.object_id)

FROM

sys.all_columns a

CROSS JOIN

sys.all_columns b;

UPDATE #Test SET Value = 'No' WHERE ID <= 5000;

UPDATE #Test SET Value = 'Mbe' WHERE ID <= 10000 AND ID > 5000;

UPDATE #Test SET Value = 'Yes' WHERE ID > 10000;

DECLARE @yes VARCHAR(3) = 'Yes';

SET STATISTICS XML ON;

SELECT * FROM #Test WHERE Value = 'Yes';

SELECT * FROM #Test WHERE Value = @yes;

SET STATISTICS XML OFF;

DROP TABLE #Test;

The SQL Guy @ blogspot[/url]

@SeanPearceSQL

About Me[/url]