T-SQL Variable vs String Equivalent

  • Got a question with some coding standards. There are developers declaring a variable to a constant value and using that in multiple places in a stored proc rather that using the string equivalent.

    For example:

    DECLARE @yes VARCHAR(3) = 'Yes'

    SELECT [pk_value]

    ,[field_value]

    FROM [tempdb].[dbo].[table7]

    WHERE [field_value] = @yes;

    rather than:

    SELECT [pk_value]

    ,[field_value]

    FROM [tempdb].[dbo].[table7]

    WHERE [field_value] = 'Yes';

    I setup a quick and dirty example and ended up with the example below, the same execution plan was generated, but query using the variable was slower, using more CPU.

    Is there any scenario out there where a variable would be faster than the string equivalent?

    Thanks,

    Kevin

  • Hi SQL Padawan,

    I coded up your example to explain the difference between estimated execution plans and actual I/O statistics.

    Both of your statements use the idx_field_value in the query plan.

    In my example below, I could not get the query analyzer to use the this index due to the amount of data in the table (2 records).

    It performs a full table scan (FTS).

    Here are some things to note.

    1 - The plan with a constant will probably be re-used. This is due to the fact that we know what the value is.

    On the other hand, this may lead to parameter sniffing.

    2 - The plan with the variable will be recompiled every time. This is due to the fact it might change.

    This is even more likely if the input parameter is set to a local variable before executing. And the local variable is used in the where clause.

    Please see Grant Fritchey query plan e-book from simple talk.

    I usually look at time and I/O when running a query. The below TSQL with turn these counters on in the query window.

    -- Show time & i/o

    SET STATISTICS TIME ON

    SET STATISTICS IO ON

    GO

    I usually clear the buffers and plan cache in test so that stored plans do not skew the results.

    -- Remove clean buffers & clear plan cache

    CHECKPOINT

    DBCC DROPCLEANBUFFERS

    DBCC FREEPROCCACHE

    GO

    My simple execution using two records show that the non-parameter solution is 1 ms quicker than the one with a variable.

    Both have the same page scans (logical and physical).

    Please redo clearing the buffers and cache and post the results for I/O.

    If they are almost the same, there is really not difference between the two constructs other than possibly removing parameter sniffing.

    Sincerely

    John

    PS: I hope this post helps you out!

    :w00t:

    --

    -- Sample code

    --

    -- Create temp table

    create table tempdb.dbo.table7

    (

    [pk_value] int,

    [field_value] varchar(10)

    );

    -- Create NC index

    create nonclustered index idx_field_value on tempdb.dbo.table7 ([field_value]);

    -- Insert two rows

    insert into tempdb.dbo.table7 values

    (1, 'Yes'),

    (2, 'No');

    -- Update the stats

    update statistics tempdb.dbo.table7;

    go

    -- Show the data

    select * from tempdb.dbo.table7

    -- Show time & i/o

    SET STATISTICS TIME ON

    SET STATISTICS IO ON

    GO

    -- Remove clean buffers & clear plan cache

    CHECKPOINT

    DBCC DROPCLEANBUFFERS

    DBCC FREEPROCCACHE

    GO

    -- Declare local variable

    declare @yes varchar(3) = 'Yes'

    -- First way

    select

    [pk_value]

    ,[field_value]

    from tempdb.dbo.table7

    where [field_value] = @yes;

    /*

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 18 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    (1 row(s) affected)

    Table 'table7'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 14 ms.

    */

    -- Remove clean buffers & clear plan cache

    CHECKPOINT

    DBCC DROPCLEANBUFFERS

    DBCC FREEPROCCACHE

    GO

    -- Second way

    select

    [pk_value]

    ,[field_value]

    FROM tempdb.dbo.table7

    WHERE [field_value] = 'Yes';

    /*

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 22 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    (1 row(s) affected)

    Table 'table7'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 13 ms.

    */

    John Miner
    Crafty DBA
    www.craftydba.com

  • j.miner (4/8/2013)


    2 - The plan with the variable will be recompiled every time. This is due to the fact it might change.

    This is even more likely if the input parameter is set to a local variable before executing. And the local variable is used in the where clause.

    Not true.

    Variables do not cause recompilation (nor do constants or parameters). The plan will be cached and reused just as any other query, in fact, the caching will be exactly the same as the version of the query with the constant in it.

    What causes recompilation:

    The OPTION (RECOMPILE) hint (actually causes compile, not recompile)

    The WITH RECOMPILE procedure option (actually causes compile, not recompile)

    A change in statistics

    A modification to an object that the query is dependent on.

    Rebuilding an index on a table the query uses.

    Clearing the plan cache (actually causes compile, not recompile)

    See http://sqlinthewild.co.za/index.php/2011/08/16/compiles-and-recompiles/ for what causes compiles and recompiles.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Basically the difference between the two comes down to parameter sniffing. At compile time the optimiser can see the value of the constant and can compile a query plan optimal for that value. This is called parameter sniffing. It is a Good Thing (most of the time).

    When using a variable, the optimiser cannot sniff the value at compile time (the variable has no value at compile time) and hence the optimiser cannot use the value of the variable to generate a plan optimal for that value. It generates a more generic plan. In some cases this generic plan will be less optimal than the plan compiled with parameter sniffing.

    The other side of the story however is that a generic plan may be better for reuse than an optimal plan, if the number of rows that the query affects can change radically between multiple executions. This is because in both cases the plans will be cached and reused.

    Have a read through these, they're not as detailed as they could be, very old posts. If you want more information, google "Grant Frichey" "parameter sniffing", you'll find some of his articles on the subject.

    http://sqlinthewild.co.za/index.php/2007/11/27/parameter-sniffing/

    http://sqlinthewild.co.za/index.php/2008/02/25/parameter-sniffing-pt-2/

    http://sqlinthewild.co.za/index.php/2008/05/22/parameter-sniffing-pt-3/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • A great article that could be added to this behavior is from Erland SommarsKog. Also as John mentioned, the above query may have been added by developers to avoid Parameter sniffing.

    http://www.sommarskog.se/query-plan-mysteries.html

  • Thanks all for the replies.

    To clarify a couple of points. The variable is being used as a constant in the code. @yes will always equal 'Yes'. Rather than have a type-o, the developer will declare @yes at the top of the code and use it rather than typing 'Yes' multiple times in the code. Apply this logic to a more complex example, and hopefully you get the idea.

    Maybe I'm misunderstanding param sniffing, but isn't the optimizer going to generate a plan and keep it until something changes (index rebuilt\reorg, stats update, table records added\deleted, etc...), and then generate a new plan based on the updated objects? Where the variable value never changes, param sniffing won't come into play here?

    The query in question can run multiple times a second, I was simply looking at the performance of using @yes vs 'Yes', a quick and dirty adhoc looking at the cost per batch looks like the variable option is more expensive.

    Here's the entire script to recreate the results:

    /*

    USE tempdb

    GO

    IF EXISTS (

    SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'[dbo].[table7]')

    AND type IN (N'U')

    )

    DROP TABLE [dbo].[table7]

    GO

    CREATE TABLE dbo.table7 (

    pk_value INT IDENTITY(1, 1) PRIMARY KEY

    ,field_value VARCHAR(255)

    );

    GO

    INSERT INTO table7 (field_value)

    VALUES ('Yes');

    GO 5000

    INSERT INTO table7 (field_value)

    VALUES ('No');

    GO 15000

    IF EXISTS (

    SELECT *

    FROM sys.indexes

    WHERE object_id = OBJECT_ID(N'[dbo].[table7]')

    AND NAME = N'IX_field_value'

    )

    DROP INDEX [IX_field_value] ON [dbo].[table7]

    WITH (ONLINE = OFF)

    GO

    CREATE NONCLUSTERED INDEX [IX_field_value] ON [dbo].[table7] ([field_value] ASC)

    WITH (

    PAD_INDEX = OFF

    ,STATISTICS_NORECOMPUTE = OFF

    ,SORT_IN_TEMPDB = OFF

    ,IGNORE_DUP_KEY = OFF

    ,DROP_EXISTING = OFF

    ,ONLINE = OFF

    ,ALLOW_ROW_LOCKS = ON

    ,ALLOW_PAGE_LOCKS = ON

    ) ON [PRIMARY]

    GO

    */

    /*

    USE tempdb

    GO

    --Include Actual Execution Plan

    DECLARE @yes VARCHAR(3) = 'Yes'

    SELECT [pk_value]

    ,[field_value]

    FROM [tempdb].[dbo].[table7]

    WHERE [field_value] = @yes;

    GO

    SELECT [pk_value]

    ,[field_value]

    FROM [tempdb].[dbo].[table7]

    WHERE [field_value] = 'Yes';

    GO

    */

  • sqlpadawan_1 (4/8/2013)


    To clarify a couple of points. The variable is being used as a constant in the code. @yes will always equal 'Yes'. Rather than have a type-o, the developer will declare @yes at the top of the code and use it rather than typing 'Yes' multiple times in the code. Apply this logic to a more complex example, and hopefully you get the idea.

    That is going to harm performance. I suggest you head that one off as soon as possible, get them using string literals, not variables

    Maybe I'm misunderstanding param sniffing, but isn't the optimizer going to generate a plan and keep it until something changes (index rebuilt\reorg, stats update, table records added\deleted, etc...), and then generate a new plan based on the updated objects? Where the variable value never changes, param sniffing won't come into play here?

    Correct on the caching and reuse. Parameter sniffing won't come into effect because the optimiser can't sniff the value of variables, this will result in generic plans which are likely to perform worse than an optimal plan would. See the second of the links I posted.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Maybe I'm misunderstanding param sniffing, but isn't the optimizer going to generate a plan and keep it until something changes (index rebuilt\reorg, stats update, table records added\deleted, etc...), and then generate a new plan based on the updated objects? Where the variable value never changes, param sniffing won't come into play here?

    If the value never changes in your where clause, you have some really boring code?

    😛

    Again, it all depends upon performance. That is why I suggested looking at run time and i/o. If you can live with a generic plan that uses variables, why not?

    John Miner
    Crafty DBA
    www.craftydba.com

  • Thank you all for taking time to reply, I appreciate all of the input.

  • j.miner (4/8/2013)


    Maybe I'm misunderstanding param sniffing, but isn't the optimizer going to generate a plan and keep it until something changes (index rebuilt\reorg, stats update, table records added\deleted, etc...), and then generate a new plan based on the updated objects? Where the variable value never changes, param sniffing won't come into play here?

    If the value never changes in your where clause, you have some really boring code?

    😛

    Again, it all depends upon performance. That is why I suggested looking at run time and i/o. If you can live with a generic plan that uses variables, why not?

    Isn't all code boring after it's been written and running in production?

    Imagine a SELECT in a proc that returns active widgets, the WHERE is always going to be the same. So do you code your where clause 'WHERE Active = 'YES'' or WHERE Active = @yes? If I can consistently prove that the variable code uses more CPU and is slower, than in my mind, it's an open and shut case. What does the variable declaration buy you? I'm either lazy or efficient (probably lazy), but in my mind this just clouds the code.

    I typically do look at IO and runtime (I <3 my profiler more than the query io \ runtime measures), but I also find it helpful to run both and look at the cost relative to the batch.

  • Very true. However, life is shades of gray, never black & white.

    Many of the applications that run on my boxes are vendor applications.

    Some are vendors that do not even exist any more.

    I have a limited staff (2) and resources to manage > 50 production servers.

    Given that fact, I only worry about the ones that are taking a major time to run once or take a medium time but run many times.

    In short, I kill the large elephants but a lion or two might still be out there.

    But if they do not bother anyone, I do not worry about them.

    Good luck in your endeavors!

    John

    John Miner
    Crafty DBA
    www.craftydba.com

  • Heh... there are no shades of gray. Either the code is fast or slow. Whether you choose to work on it or not is a decision. 🙂

    Anyway, I've seen it on my home machine where the literal is always faster than the variable. That's using 2K5 Developer's Edition (virtually identical to the Enterprise Edition) on a 32 bit box. I'm not there right now or I'd show you the runs from that.

    On the 2K5 Enterprise Edition 64K Box at work, the two pieces of code take turns winning within milliseconds of each other on a million row table. Not that I used dump-variables to take the display out of the picture because I don't want to measure time to display. The execution plan on both is identical to the other including % of batch.

    As a bit of a sidebar, using % of batch to figure out which piece of code is more efficient is a very bad thing to do. Even after an actual run, it's still an estimate and I've seen it be 100% incorrect. I even gave Grant Fritchey the example and he included it in his APress book on execution plans.

    Trying to test such problems with only 2 rows won't get anyone anywhere. It's easy to make a million row test table for such things in seconds. Here's some code that does just such a thing for this problem.

    -- DROP TABLE tempdb.dbo.Table7

    SELECT TOP 1000000

    PK_Value = IDENTITY(INT,1,1)

    , Field_Value = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 0 THEN 'No' ELSE 'Yes' END

    INTO tempdb.dbo.Table7

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    ;

    ALTER TABLE tempdb.dbo.Table7

    ADD PRIMARY KEY CLUSTERED (PK_Value)

    ;

    CREATE INDEX IX_Table7_Field_Value

    ON tempdb.dbo.Table7 (Field_Value)

    ;

    Here's the code I used to test with. Like I said, I used variables to take the display time out of the picture. It's also worth noting that you should never use SET STATISTICS when testing User Defined Fuctions because it adds a huge amount of time to Scalar Functions in particular.

    DECLARE @PK_Value INT

    , @Field_Value CHAR(3)

    ;

    SET STATISTICS TIME ON

    ;

    DECLARE @yes VARCHAR(3)

    SELECT @yes = 'Yes'

    ;

    SELECT @PK_Value = PK_Value

    , @Field_Value = Field_Value

    FROM tempdb.dbo.Table7

    WHERE Field_Value = @yes

    ;

    SELECT @PK_Value = PK_Value

    , @Field_Value = Field_Value

    FROM tempdb.dbo.Table7

    WHERE Field_Value = 'Yes'

    ;

    GO

    SET STATISTICS TIME OFF

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • On my desktop (2k8) both queries perform equally:

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 141 ms, elapsed time = 148 ms.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 156 ms, elapsed time = 148 ms.

    Times may vary from run to run by couple of ms, but they are always in line with each other.

    That's probably because the "generic" execution plan chosen for @yes version is identical to the "sniffing" plan.

    Both are cached and reused.

    So SQL Server runs effectively the same code against the same set of data - no difference to be expected.

    But it's this particular case of the table structure and data distribution.

    If I change the cardinality by using this"

    , Field_Value = CASE WHEN ABS(CHECKSUM(NEWID()))%20 = 1 THEN 'Yes' ELSE 'No' END

    then literal constant outperforms varuiable.

    it's still < 5ms difference on my machine, but with total execution time around 20 ms it's noticable.

    In reverse case (90% of 'YES' and 10% of 'NO') times are basically the same:

    SQL Server Execution Times:

    CPU time = 281 ms, elapsed time = 289 ms.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 297 ms, elapsed time = 287 ms.

    ....

    SQL Server Execution Times:

    CPU time = 281 ms, elapsed time = 297 ms.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 312 ms, elapsed time = 302 ms.

    _____________
    Code for TallyGenerator

  • sqlpadawan_1 (4/8/2013)


    Imagine a SELECT in a proc that returns active widgets, the WHERE is always going to be the same. So do you code your where clause 'WHERE Active = 'YES'' or WHERE Active = @yes? If I can consistently prove that the variable code uses more CPU and is slower, than in my mind, it's an open and shut case. What does the variable declaration buy you? I'm either lazy or efficient (probably lazy), but in my mind this just clouds the code.

    If you are lazy (good on you! 🙂 ) create a view ActiveWidgets with hardcoded literal - and use it in all relevant procedures.

    Be careful not to put any calculations/conversions into that view, just row filter (and possibly limited columns).

    Otherwise you're gonna create horrible performance issues.

    _____________
    Code for TallyGenerator

  • j.miner (4/8/2013)


    Maybe I'm misunderstanding param sniffing, but isn't the optimizer going to generate a plan and keep it until something changes (index rebuilt\reorg, stats update, table records added\deleted, etc...), and then generate a new plan based on the updated objects? Where the variable value never changes, param sniffing won't come into play here?

    It's got nothing to do with having the wrong plan in cache, but everything to do with the estimate row counts.

    When you use a literal, SQL Server will estimate the row count using statistics. When using a local variable, SQL Server has no choice but to do a thumb suck. The thumb suck is row count / 3 when there are more than 2 distinct values in the column and row count / 2 when there are 2 distinct values.

    Consider this example:

    I have a table with 100,000 rows. Let's say 90% of the rows contain the value 'Yes'. When using a literal the estimated number of rows will be 90,000. When using a variable the estimated number of rows is 33,333.3

    The following code illustrates this:

    CREATE TABLE #Test (ID INT, Value VARCHAR(3));

    INSERT INTO #Test (ID)

    SELECT TOP 100000

    ROW_NUMBER() OVER (ORDER BY a.object_id)

    FROM

    sys.all_columns a

    CROSS JOIN

    sys.all_columns b;

    UPDATE #Test SET Value = 'No' WHERE ID <= 5000;

    UPDATE #Test SET Value = 'Mbe' WHERE ID <= 10000 AND ID > 5000;

    UPDATE #Test SET Value = 'Yes' WHERE ID > 10000;

    DECLARE @yes VARCHAR(3) = 'Yes';

    SET STATISTICS XML ON;

    SELECT * FROM #Test WHERE Value = 'Yes';

    SELECT * FROM #Test WHERE Value = @yes;

    SET STATISTICS XML OFF;

    DROP TABLE #Test;

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply