Use Dynamic SQL to Improve Query Performance

  • James Dingle-651585 (5/24/2010)


    It is difficult to demonstrate for me now because I am not at the office, the computer I use now does not have SQL... sorry if I take time to get there πŸ™‚

    James, Ok, I'll be interested to see it working. Could you use the test data in my reply here with only @Name set to a value which is searched using '=' and with 'LIKE' for colour and inequalities for the price and dates i.e. logically equivalent to:

    SELECT *

    FROM [Product]

    WHERE [Name] = Coalesce(@Name, [Name])

    AND [Color] LIKE Coalesce(@Color, [Color] )

    AND [ListPrice] >= Coalesce(@ListPriceMin, [ListPrice])

    AND [ListPrice] <= Coalesce(@ListPriceMax, [ListPrice])

    AND ModifiedDate >= Coalesce(@ModifiedDateMin, ModifiedDate)

    AND ModifiedDate <= Coalesce(@ModifiedDateMax, ModifiedDate)

  • Jeff Moden (5/23/2010)


    Heh... I actually forgot to mark the article before. 5 stars it is.;-)

    Thanks Jeff, a bit of balance has been restored to The Force πŸ™‚

  • Gianluca Sartori (5/24/2010)


    josh shilling (5/21/2010)


    See this post by Mark Cohen...

    That page contains quite questionable statements. It's a shame it still comes in the first Google page for "sql short-circuit".

    I'll try to finish soon my article on short-circuiting and (hopefully) get it published here.

    More than questionable, as I tried to show with the code snippet I posted.

    The fact is that we might infer that SQL Server does do some short-circuiting in some circumstances (start-up filters perhaps) but it is not documented or guaranteed. Given that much better alternatives exist, it strikes me as being extremely poor practice to rely on this sort of "by-inspection" behaviour. If you need a reviewer or second contributor for your article, send me a PM.

    Paul

  • The approach advocated by James is much more suited to simpler requirements based on OR conditions, as his second example shows.

    The first submission attempted to apply the technique to a query using ANDed predicates, and turns out to be problematic even when the bugs are removed. Essentially, the idea relies on Start-Up Filters to re-use a generic query plan, but to avoid executing branches that are not needed at run time.

    The concept is appealing, and I did play around with this several years ago when 2005 was new. It is perfectly possible to produce an efficient plan when all the predicates are SARGable and linked with OR. This is because the keys contributing to each input of the concatenation will always be required, and the effect is similar to an optimised index intersection.

    Things become much more difficult when predicates are non-SARGable (necessitating index scans) or when ANDs are involved. The problem with an AND is that the net result of all the predicates is almost always smaller than any of the individual inputs. If one or more of the (non-NULL) search conditions are not very selective on the target index (or if the optimiser does not have good statistics) the plan produced can be very slow (even with RECOMPILE specified).

    In the case presented by Jonathan (the million-row table) it is just about impossible to find a general solution based on start-up filters that performs anywhere near as well as dynamic SQL or the optimal code written for 2008 SP1 CU5 onward with RECOMPILE specified. Some inputs can be made to work reasonably well so long as all the non-NULL conditions are very selective, or if extensive hinting is applied (e.g. FORCE ORDER, LOOP JOIN, RECOMPILE, MERGE UNION...and so on).

    The start-up filter idea also struggles with some range conditions, since COALESCE just expands to a non-SARGable CASE statement before optimisation. An index scan might be fine on a small table, but it hurts performance severely on the million-row example. It is possible to make the search conditions on Modified Date and List Price SARGable by using a hard-coded constant with ISNULL to fix the lower and upper bounds of the range (e.g. ListPrice <= ISNULL(@ListPriceMax, 922337203685477.5807)) but this is not very satisfactory.

    In summary, I would avoid this method unless your requirements are for simple query conditions joined exclusively by ORs, with no range conditions. Anyone using 2008 SP1 CU5 or later should ignore the method completely and use dynamic SQL or code based on the following model:

    -- SQL 2008 SP1 CU5 onward ONLY

    SELECT *

    FROM [Product]

    WHERE (@Name IS NULL OR [Name] LIKE @Name)

    AND (@Color IS NULL OR [Color] LIKE @Color)

    AND (@ListPriceMin IS NULL OR [ListPrice] >= @ListPriceMin)

    AND (@ListPriceMax IS NULL OR [ListPrice] <= @ListPriceMax)

    AND (@ModifiedDateMin IS NULL OR ModifiedDate >= @ModifiedDateMin)

    AND (@ModifiedDateMax IS NULL OR ModifiedDate <= @ModifiedDateMax)

    OPTION (RECOMPILE);

    Paul

  • Things become much more difficult when predicates are non-SARGable (necessitating index scans) or when ANDs are involved. The problem with an AND is that the net result of all the predicates is almost always smaller than any of the individual inputs. If one or more of the (non-NULL) search conditions are not very selective on the target index (or if the optimiser does not have good statistics) the plan produced can be very slow (even with RECOMPILE specified).

    Indeed, but in appearance. I did not want to enter the details because I did not want to write an article within the article. But let's step back from the code, and let's try to think how the AND problem should be solved conceptually.

    Let's imagine you want to allow a search on the last name AND the first name. You can have an index on the last name only, another one on the first name only, and join the intermediate results. This works only if the selectivity is small in both columns. But if you search for "John Smith", unfortunately we can imagine that the intermediate results will be large on both first and last names, thus turning in a heavy IO query for a very small final set of results.

    In that case, the answer is very easy for everybody : you just use a covering index on both columns last and first name, i.e. you add the "first name" column to the "last name" index and that's it.

    You would probably blame the developer who does not use a two-columns index in this case. I would equally blame the developer who does not use a two-column index in the way I proposed to write the dynamic query. After all, it's only different ways to type it, but at the end of the story, SQL compiler should end up with a single index seek, with "seek" (and not "predicate") on the two columns right? So whatever method you use, this is the goal to achieve.

    So this writing actually reflects this logic.

    WHERE

    -- Last name only

    UserId IN (SELECT UserId FROM Users WHERE LastName = @LastName AND @LastName IS NOT NULL AND @FirstName IS NULL)

    -- Last name and first name

    OR UserID IN (SELECT UserId FROM Users WHERE LastName = @LastName AND FirstName = @FirstName AND @LastName IS NOT NULL AND @FirstName IS NOT NULL)

    -- First name only

    OR UserID IN (SELECT UserId FROM Users WHERE FirstName = @FirstName AND @FirstName IS NOT NULL)

    The bottom line of the "AND" problem is that you have two types of "AND" conditions : the one you will solve by a covering or multi-column index (or an indexed view if the columns are not in the same table), and the one you will not optimize because the selectivity of each criteria separately is enough and does not worth the price in disk space. For instance, you may decide that it's not worth it to have a covering index on the last name and the email, because the email is selective enough. Again, you will blame the developer who creates such an index.

    To say it differently, all the "ANDs" (intersections) have to be applied before the "ORs" (unions). Intersections after unions can turn into very bad performance if sets are large. It seems logical that you have to start with the smallest sets possible from the beginning of the execution plans, i.e. out of SQL indexes.

    By the way, in case you were thinking about it, there is a reason why you cannot write a statement like this :

    OR UserID IN (SELECT UserID FROM Users WHERE @LastName = LastName AND (@FirstName IS NULL OR FirstName = @FirstName))

    In that case, the LastName will be performed with a seek, and the FirstName will be performed with a predicate; again because SQL will have to choose a plan, not knowing the value of the parameters.

    It may seems redundant writing, I admit it (you can optimize by using bitwise flags or other techniques if the parameters get too many). But the way to read the total unverse of possibilities, is to list on one hand each case on which you support optimization is actually written down and checking that there is a part of the execution plan that supports it, and on the other hand which queries are an extension of optimized ones, plus basic filtering. My personal experience was that it is way more predictible and easy to have a global view of one stored procedure, while I could never trust dynamic SQL to 100%. When a developer was using dynamic SQL and I asked him what was optimized and what was not, most of the time the answer I get was about the cases he actually tested, but not the total universe of combinations admitted by such a query.

    About the RECOMPILE statement, this was not a possibility we could afford in our application. We had to handle something like 4000 batches per second and dealing with recompilation at each call was not thinkable.

    The fact is that we might infer that SQL Server does do some short-circuiting in some circumstances (start-up filters perhaps) but it is not documented or guaranteed. Given that much better alternatives exist, it strikes me as being extremely poor practice to rely on this sort of "by-inspection" behaviour.

    I don't like the word "short-circuit" because it sounds like a hack, like abusing of some behavior which is not on purpose. I prefer to consider that SQL engine processes scalar logic before set logic, and consider this is a strong design architecture of the execution plan compiler that you can rely your code on, exactly as you will write things like "if (c != null && c.Property = 1)" in C language so you cannot get null pointer exception (while it did not work in VB6). The point of the documentation is relevant though, because it raises the question about knowing for how long this is going to be true.

    I do not consider this practice as poor, or I will have difficulties to understand how a poor practice succeeded in putting a database project and system in a more stable, predictable and manageable direction than a good one. As I've said in my first post in this thread, I do not recommend the dynamic SQL statement approach, for the reasons I've mentionned, and based on the experience I had and the objectives our team had to meet, which were basically very high performance and reliability metrics. If some people still wants to use it because in their case it's more convenient or agile, well I think that the weak point of this approach is its lack of predictibility which has to be compensated by very strong testing -- and even though, I do not sleep entirely peacefully. πŸ™‚

  • James,

    Your increasingly complex nested ORs and ANDs (now in multiple combinations) are looking less and less like the original 'simple' CTE every time you post. I do understand the point you are trying to make, but you still have not provided a testable solution to the challenges laid down so far to illustrate your point, and so risk losing credibility here. I would invite you to solve the problem presented by Jonathan so we can truly compare like with like here.

    The wider point you are missing with recompilation (on the SQL Server 2008 version I keep mentioning) is that the optimiser builds a 'throw away' plan taking advantage of the specific parameter values at the execution time. This results in a highly optimised plan, not based on start-up filters, but with the unused plan segments removed entirely.

    For example, on the million-row test, the optimiser produces a plan based on a true 3-way index intersection (with seeks on all indexes) and a key lookup on the cluster to fetch the remaining off-index columns: (Very nearly as efficient as the dynamic SQL).

    I do disagree with most of your comments, but do not propose to restate facts already presented. The time has come for you to back up your statements with a robust code example - no bugs, no excuses - let's see how it runs for real!

    Paul

  • This is the actual code that ran to produce the execution plan in my last post:

    DECLARE @Name NVARCHAR(100),

    @Color NVARCHAR(100),

    @ListPriceMax MONEY,

    @ListPriceMin MONEY,

    @ModifiedDateMin DATETIME,

    @ModifiedDateMax DATETIME;

    SELECT @Name = N'A%',

    @Color = N'G%',

    @ListPriceMax = $1500,

    @ListPriceMin = $500,

    @ModifiedDateMin = '20050101',

    @ModifiedDateMax = '20060101';

    SELECT *

    FROM [Product]

    WHERE (@Name IS NULL OR [Name] LIKE @Name)

    AND (@Color IS NULL OR [Color] LIKE @Color)

    AND (@ListPriceMin IS NULL OR [ListPrice] >= @ListPriceMin)

    AND (@ListPriceMax IS NULL OR [ListPrice] <= @ListPriceMax)

    AND (@ModifiedDateMin IS NULL OR ModifiedDate >= @ModifiedDateMin)

    AND (@ModifiedDateMax IS NULL OR ModifiedDate <= @ModifiedDateMax)

    OPTION (RECOMPILE);

  • Something that often goes overlooked when using sp_executesql is explicitly using unicode strings. It is good to keep in mind the SQL and ParmDefinition parameters are of nvarchar datatype. In the article nvarchar was used to declare the @sSql, @sWhereClause, @ParmDefinition and @NewLine. But throughout there were non-unicode strings being assigned to them.

    So instead of doing this: SET @sSql = 'SELECT [Name],' + @NewLine

    + ' [ProductNumber],' + @NewLine

    + ' [ListPrice],' + @NewLine

    + ' [ModifiedDate]' + @NewLine

    + ' FROM [Production].[Product]' + @NewLine

    + ' WHERE 1 = 1' -- Always true

    Using a unicode string as follows would be preferred:

    SET @sSql = N'SELECT [Name],' + @NewLine

    + N' [ProductNumber],' + @NewLine

    + N' [ListPrice],' + @NewLine

    + N' [ModifiedDate]' + @NewLine

    + N' FROM [Production].[Product]' + @NewLine

    + N' WHERE 1 = 1' -- Always true

    Prepending the "N" will ensure it is a unicode string and avoid the implicit conversion that goes on. As a good practice this should be done where ever the hardcoded strings are assigned to the nvarchar datatypes.

  • Adam Gojdas (5/25/2010)


    Something that often goes overlooked when using sp_executesql is explicitly using unicode strings. It is good to keep in mind the SQL and ParmDefinition parameters are of nvarchar datatype. In the article nvarchar was used to declare the @sSql, @sWhereClause, @ParmDefinition and @NewLine. But throughout there were non-unicode strings being assigned to them.

    So instead of doing this: SET @sSql = 'SELECT [Name],' + @NewLine

    + ' [ProductNumber],' + @NewLine

    + ' [ListPrice],' + @NewLine

    + ' [ModifiedDate]' + @NewLine

    + ' FROM [Production].[Product]' + @NewLine

    + ' WHERE 1 = 1' -- Always true

    Using a unicode string as follows would be preferred:

    SET @sSql = N'SELECT [Name],' + @NewLine

    + N' [ProductNumber],' + @NewLine

    + N' [ListPrice],' + @NewLine

    + N' [ModifiedDate]' + @NewLine

    + N' FROM [Production].[Product]' + @NewLine

    + N' WHERE 1 = 1' -- Always true

    Prepending the "N" will ensure it is a unicode string and avoid the implicit conversion that goes on. As a good practice this should be done where ever the hardcoded strings are assigned to the nvarchar datatypes.

    Thanks, there are more details of why here: http://support.microsoft.com/kb/239530

  • Further to Paul's invitation I have prepared a more elaborated comparison between 3 methods we have talked about in this thread.

    I hope that, and the end of my comparison, and assuming you will think I have not missed something important, you will understand my concerns and warnings about dynamic SQL that generates code.

    =========================================================================

    OBJECT TESTED:

    - A user table with 1 million rows and 4 columns: ID, First name, Last name, Email.

    - System tested: SQL2008 x64 10.0.2531, Windows 7 x64, Athlon 64 dual core 3800+

    GOAL TO ACHIEVE:

    - Implement a search within the users by: last name, last name and first name, and email.

    METHODS COMPARED:

    1) As a point of reference, ad-hoc stored procedures

    2) Dynamic generation of the code

    3) Dynamic SQL statement using scalar expressions

    4) ad-hoc stored procedure with OPTION (RECOMPILE)

    SPECIAL:

    - A bug is included, of course on purpose. The bug is that methods 2, 3, 4 allow to execute a seach by first name ONLY while there is no index on this column, and this is not supported by the writer of the stored procedure. In this example, the bug is very obvious to notice, just looking at the code. I ask you to admit that in real conditions, in a code that is 2 or 3 pages long, the bug would be less obvious, and can reach production without having been detected.

    WHAT I WOULD LIKE TO EVALUATE:

    - I would like to show how the method 2 can lead to nasty situations when the bug is met, and how can a query and an execution plan that have never been reviewed get executed on production.

    - I would like to show how the method 3 prevents, when the same bug is met, that a unexpected execution plan is run, and thus how this methods protects the total application from unexpected use of its stored procedures.

    - I would like to show how method 3 allows the writer to review ALL possible code paths while method 2 can't.

    - I would like to show that method 4 is actually slower so it's not a solution if you have a lot of activity on the machine and/or if response time is important.

    - The post is not about performance, the post is about predictability, how robust is your application once shipped. Though it is important to compare performances because, even if a method is better on a predictibility and stability point of view, it's not really usable if it does not perform well.

    - I WILL NOT discuss about execution plans and how to get to a satisfying one with each method used. I assume that we reached this point so we can compare them. I think I wrote the code impartially.

    - General comments that have happened will be added as Q&A at the end.

    ==========================================================================

    TEST SETUP

    Run DynamicQueries_Databases.txt to initialize the database

    Set NoCount On;

    -- Create the database

    Use master;

    Go

    If Exists (Select * From sys.databases Where name = 'DynamicQueries')

    Drop Database DynamicQueries;

    Create Database DynamicQueries;

    Go

    Use DynamicQueries;

    Go

    -- Create the table

    Create Table [User]

    (

    UserIdInt Not Null Identity (1, 1) Not For Replication,

    LastNameVarChar(50) Not Null,

    FirstNameVarChar(50) Not Null,

    EmailVarChar(255) Not Null,

    Constraint PK_User Primary Key Clustered (UserId)

    );

    Create Index IX_Name On [User] (LastName, FirstName);

    Go

    Create Index IX_Email On [User] (Email);

    Go

    -- Fill with data

    Declare @I Int;

    Set @I = 1000000/50000;

    While @I > 0

    Begin

    RaisError('%d filling iterations remaining...', 10, 1, @I) With NoWait;

    Insert Into [User] (LastName, FirstName, Email)

    SelectTop 50000

    NEWID(), NEWID(), NEWID()

    Fromsys.messages

    Set @I = @I - 1;

    End

    -- Rebuild so the indexes are most efficient

    Alter Table [User] Rebuild;

    Alter Index IX_Email On [User] Rebuild;

    Alter Index IX_Name On [User] Rebuild;

    Run DynamicQueries_Databases to create the stored procedures

    Use DynamicQueries;

    Go

    -- Case 1 : static ad-hoc queries, to serve as reference

    If OBJECT_ID('Case1GetUserByLastName', 'P') Is Not Null Drop Procedure Case1GetUserByLastName;

    Go

    Create Procedure Case1GetUserByLastName (

    @LastName VarChar(50)

    )

    As

    RaisError('Case1GetUserByLastName', 10, 1) With NoWait;

    Select*

    From[User]

    WhereLastName = @LastName

    Go

    If OBJECT_ID('Case1GetUserByLastAndFirstNames', 'P') Is Not Null Drop Procedure Case1GetUserByLastAndFirstNames;

    Go

    Create Procedure Case1GetUserByLastAndFirstNames (

    @LastNameVarChar(50),

    @FirstNameVarChar(50)

    )

    As

    RaisError('Case1GetUserByLastAndFirstNames', 10, 1) With NoWait;

    Select*

    From[User]

    WhereLastName = @LastName

    AndFirstName = @FirstName

    Go

    If OBJECT_ID('Case1GetUserByEmail', 'P') Is Not Null Drop Procedure Case1GetUserByEmail;

    Go

    Create Procedure Case1GetUserByEmail (

    @Email VarChar(255)

    )

    As

    RaisError('Case1GetUserByEmail', 10, 1) With NoWait;

    Select*

    From[User]

    WhereEmail = @Email;

    Go

    -- Case 2 : dynamic query using generated code

    If OBJECT_ID('Case2GetUser', 'P') Is Not Null Drop Procedure Case2GetUser;

    Go

    Create Procedure Case2GetUser (

    @LastNameVarChar(50) = Null,

    @FirstNameVarChar(50) = Null,

    @EmailVarChar(255) = Null,

    @DebugBit = 0

    )

    As

    RaisError('Case2GetUser', 10, 1) With NoWait;

    Declare @Sql NVarChar(4000);

    Set @Sql = 'SELECT * FROM [User] WHERE 1 = 1 '

    If @LastName Is Not Null

    Set @Sql = @Sql + ' AND LastName = @LastName '

    If @FirstName Is Not Null

    Set @Sql = @Sql + ' AND FirstName = @FirstName '

    If @Email Is Not Null

    Set @Sql = @Sql + ' AND Email = @Email '

    If @Debug = 1

    Print @Sql;

    Exec sp_executesql @Sql, N'@LastName VarChar(50), @FirstName VarChar(50), @Email VarChar(50)',

    @LastName = @LastName,

    @FirstName = @FirstName,

    @Email = @Email;

    Go

    -- Case 3 : dynamic query using scalar expressions

    If OBJECT_ID('Case3GetUser', 'P') Is Not Null Drop Procedure Case3GetUser;

    Go

    Create Procedure Case3GetUser (

    @LastNameVarChar(50) = Null,

    @FirstNameVarChar(50) = Null,

    @EmailVarChar(255) = Null

    )

    As

    RaisError('Case3GetUser', 10, 1) With NoWait;

    Select*

    From[User]

    Where

    (

    -- Last name only (omitted but implicit: "And @LastName Is Not Null")

    UserId In (Select UserId From [User] Where @LastName = LastName And @FirstName Is Null)

    -- Last name and first name together (omitted but implicit "And LastName Is Not Null And @FirstName Is Not Null")

    Or UserId In (Select UserId From [User] Where @LastName = LastName And @FirstName = FirstName)

    -- by email (omitted but implicit "And @Email Is Not Null")

    Or UserId In (Select UserId From [User] Where @Email = Email)

    )

    Go

    -- Case 4 : just-in-time compilation

    If OBJECT_ID('Case4GetUser', 'P') Is Not Null Drop Procedure Case4GetUser;

    Go

    Create Procedure Case4GetUser (

    @LastNameVarChar(50) = Null,

    @FirstNameVarChar(50) = Null,

    @EmailVarChar(255) = Null

    )

    As

    RaisError('Case4GetUser', 10, 1) With NoWait;

    Select*

    From[User]

    Where

    (

    (

    @LastName = LastName

    And

    (

    @FirstName = FirstName

    Or @FirstName Is Null

    )

    )

    Or

    (

    @Email = Email

    )

    )

    Option (Recompile)

    Go

    ==========================================================================

    TEST RUN

    A. Showing how generation of code on the fly can be dangerous

    1. Open DynamicQueries_BVTs.txt

    -- Run this query to initialize the variables below

    -- Select Top 1 LastName, FirstName, Email From [User]

    -- Activate SQLCMD mode!

    :setvarLASTNAME"4A7801E6-B295-46BD-953F-9FAC90E9B5E4"

    :setvarFIRSTNAME"2C4C3500-1ADC-46E0-8A7C-80008B099A28"

    :setvarEMAIL"30A7195A-2E88-49B5-95DC-818E08A893F0"

    Set NoCount On;

    Set Language us_english;

    Set Statistics IO On;

    Declare @LastName VarChar(50) = NewId();

    Go

    Print '

    BVTs

    '

    -- BVT case 1

    Exec Case1GetUserByLastName @LastName = '$(LASTNAME)';

    Go

    Exec Case1GetUserByLastAndFirstNames @LastName = '$(LASTNAME)', @FirstName = '$(FIRSTNAME)';

    Go

    Exec Case1GetUserByEmail @Email = '$(EMAIL)';

    Go

    -- BVT case 2

    Exec Case2GetUser @LastName = '$(LASTNAME)';

    Go

    Exec Case2GetUser @LastName = '$(LASTNAME)', @FirstName = '$(FIRSTNAME)';

    Go

    Exec Case2GetUser @Email = '$(EMAIL)';

    Go

    -- BVT case 3

    Exec Case3GetUser @LastName = '$(LASTNAME)';

    Go

    Exec Case3GetUser @LastName = '$(LASTNAME)', @FirstName = '$(FIRSTNAME)';

    Go

    Exec Case3GetUser @Email = '$(EMAIL)';

    Go

    -- BVT case 4

    Exec Case4GetUser @LastName = '$(LASTNAME)';

    Go

    Exec Case4GetUser @LastName = '$(LASTNAME)', @FirstName = '$(FIRSTNAME)';

    Go

    Exec Case4GetUser @Email = '$(EMAIL)';

    Go

    -- Test unforeseen situation

    Print '

    Trick me

    ';

    Go

    Exec Case2GetUser @FirstName = '$(FIRSTNAME)';

    Go

    Exec Case3GetUser @FirstName = '$(FIRSTNAME)';

    Go

    Exec Case4GetUser @FirstName = '$(FIRSTNAME)';

    Go

    Set Statistics IO Off;

    2. Activate SQLCMD mode if not already

    3. Execute the first query in second line comment to get values from the table, replace mine in the ":setvar" statements.

    4. Depending if you want to see them or not, activate execution plans.

    5. Run the query

    Looking at the execution plans, we observe that:

    1. Method 2 have a unique execution plan per combination of parameters, while methods 3 and 4 always have the same one.

    2. Because methods 3 & 4 have each only ONE execution plan, it's easy to review. For method 2, you have to test all the combinations of parameters and look at each execution plan generated.

    3. The only terrible SQL statement and execution plan come from method 2. It is an illustration of an unexpected code path leading to the execution of something that has not been reviewed nor predicted.

    Looking at the I/O statistics you will observe that:

    1. All methods give the same performance when running in the use case they were designed for.

    3. In the bogus situation, method 2 is the only one to turn into a full scan, while others do not put the server in danger.

    B. Showing that OPTION RECOMPILE has a cost

    1. Open the Windows performance monitor (perfmon.msc)

    Option Explicit

    Dim con, com, constr

    Dim I

    constr = "Provider=SQLNCLI;Server=localhost\sql2008;Initial Catalog=DynamicQueries;Integrated Security=SSPI;"

    Set con = CreateObject("ADODB.Connection")

    Set com = CreateObject("ADODB.Command")

    con.Open constr

    Set com.ActiveConnection = con

    '=================================================

    'WITH OPTION RECOMPILE (CASE 4), NOT PREPARED

    con.Execute "DBCC FREEPROCCACHE"

    WScript.Sleep 5000

    com.CommandText = "Declare @Id VarChar(50); Set @Id = NewId(); Exec Case4GetUser @LastName = @Id;"

    For I = 1 To 3000

    com.Execute

    Next

    '=================================================

    'WITH OPTION RECOMPILE (CASE 4), PREPARED

    con.Execute "DBCC FREEPROCCACHE"

    WScript.Sleep 5000

    com.CommandText = "Declare @Id VarChar(50); Set @Id = NewId(); Exec Case4GetUser @LastName = @Id;"

    com.Prepared = True

    For I = 1 To 3000

    com.Execute

    Next

    '=================================================

    'WITHOUT OPTION RECOMPILE (CASE 3), NOT PREPARED

    con.Execute "DBCC FREEPROCCACHE"

    WScript.Sleep 5000

    com.CommandText = "Declare @Id VarChar(50); Set @Id = NewId(); Exec Case3GetUser @LastName = @Id;"

    For I = 1 To 3000

    com.Execute

    Next

    con.Execute "DBCC FREEPROCCACHE"

    '=================================================

    'REFERENCE (CASE 1), NOT PREPARED

    con.Execute "DBCC FREEPROCCACHE"

    WScript.Sleep 5000

    com.CommandText = "Declare @Id VarChar(50); Set @Id = NewId(); Exec Case1GetUserByLastName @LastName = @Id;"

    For I = 1 To 3000

    com.Execute

    Next

    ' Clean so Cache Hit Ratio graph falls to zero

    con.Execute "DBCC FREEPROCCACHE"

    con.Close

    2. Remove all counters.

    3. Add counters "SQL Statistics::SQL Compilations/sec", "SQL Statistics::SQL Re-compilations/sec", "Plan Cache::Cache hit ratio", "Batch Requests/sec". Put "BatchRequests/sec" bolder and in the background, in a different color than "compilations/sec".

    4. Put all the compilation scales to 1, the cache hit ratio to 10, and the maximum value of the vertical ladder to 1000. You may have to adjust the output depending on the performance of your machine.

    5. Open a command line window, and run "cscript TestDynamicQueries.vbs"

    6. The script runs 3000 executions of the following use cases:

    - Method 4 without prepare

    - Method 4 with prepare

    - Method 3 without prepare

    - Method 1 without prepare

    Reading the graph it is easy to realize that:

    1. OPTION RECOMPILE really HURTS!

    2. Using "Prepare = True" solves the compilation problem of the SQL command in the VBS script, and thus enables plan cache to work. We jump from two to one compilation per execution.

    3. But still, there is a compilation per call and this makes the 3000 executions to last way longer.

    ==========================================================================

    Q&A

    1. I use method 2, and it is not possible I ever have any problem with unexpected combination of parameters, because I totally control how the stored procedure is executed. So I can't see the point of changing to a would-be "more predictable" method.

    Well, happy for you. My experience with method 2 was different: the stored procedures were exposed through APIs to other teams, and in some cases, to external customers or users eventually. It was not rare to see stored procedure with 20 optional parameters that could all be independently set, or not. The database team did not have any control of how they were executed, and we frequently faced outages because of such unexpected conditions. It was very difficult to explain that SQL server has its own logic we can't predict before testing, and this was not playing for a good reputation of the database team.

    2. I tried to use method 3, but my execution plans were poor. Full scans, or big arrows, I can't achieve something clean.

    Most probably, you have not tried hard enough. πŸ™‚ I usually noted two paths of improvements when facing such a problem:

    a) Jumping directly in the code without forethinking how this query should be solved, what indexes will trigger the best reduction in early results, when applying filter condition (usually happens with big arrows).

    b) SQL compiler sees a potential combination you have not thought about and makes the plan ready to execute it. Search for it, decide if you want to support it or disable it using a scalar expression. Usually happens when you see big scans in the execution plan you can't get rid of.

    3. SQL Server team have never said their engine is designed to understand intents within method-3-like code. As a result, one day they may issue without warning a patch/release that makes this method not working anymore.

    This is indeed the weak point of this approach. I hope to have a response from a member of SQL Server team on this subject one day. However, I have strong reasons to believe this is not going to change overnight. First, they document how execution plans are cached and reused. Second, they have to ensure backward compatibility. But with what? It is well know for instance that if we put NEWID() in a SELECT statement, it will be executed once per row. Other functions and expressions such as GetUtcDate() won't. This shows how the functions and expressions are evaluated, and how the mess will be big if they break this behavior. So the day when method 3 won't work, a ton of code won't behave the same as well. SQL team can't make expressions evaluated once per query to suddenly be evaluated once per row.

    4. I don't get the point of what is method 3. In some post a CTE is used, in another "OR" keywords... in what it is the same?

    Method 3 is not about using X or Y keyword. You can use any instruction in the SQL dictionary. The point of method 3 is to use scalar expressions to dynamically enable or inhibit portions of the same SQL statement, while having as many times the table you want to access in your statement as ways to access it. So if you plan for instance to have 3 ways to access the table MyTable on indexes IndexA, IndexB and ClustIndex, you will find at least three times "MyTable" in the code and the output of compilation, while at execution time only one operation is actually executed. When SQL is running the query (so, after compilation), pure scalar expressions such as "GetDate(), A=B, C>1+2" are evaluated before any object (table or index) is opened. Other expressions referring to columns (LastName Like 'A%') are evaluated for every row if they have not been transform into a SEEK operation, i.e. if SQL is not using an index to get the rows that match them.

    A expression evaluated as FALSE inhibits all the work bound to it. As a very simple exemple is to look at the execution plan of "SELECT * FROM [User] WHERE 1=0".

    ==========================================================================

    CONCLUSION

    I wanted to show that concatenating SQL strings is not the only way to get dynamic execution, there are others which are also viable in terms of stability, predictability and performance.

    I also wanted to show that you must be very careful when using concatenation of SQL strings, because you and your testers may not think about all the code paths your code allows; and each one of them produces a different execution plan. Depending in how your application is used and by who, those execution plan can be activated on production by a new combination of parameters without being reviewed or predicted by the database team.

    From my own team experience, we made terrific progress in stability and predictible performance when we put as a principle that *all* execution plans and combinations of variables have to be reviewed, even when the stored procedures allowed many input parameters. The application was using dynamically generated SQL code and it was a very difficult goal to achieve because it would have mean to test a lot of different combinations.

    When we switched to adopt only one execution plan per stored procedure, we could adopt a methodology which we can classify the 2 or 3 use cases we really want to optimize best, the others use cases for which moderate performance is acceptable, and most important, that there is no "leak" or possibility to trigger a very very bad execution plan leading to immediate outages.

  • Hey James,

    So:

    "It is well know for instance that if we put NEWID() in a SELECT statement, it will be executed once per row. Other functions and expressions such as GetUtcDate() won't. This shows how the functions and expressions are evaluated, and how the mess will be big if they break this behavior."

    Really? Read the comments from the SQL Server development team on this Won't Fix bug report:

    https://connect.microsoft.com/SQLServer/feedback/details/350485/bug-with-newid-and-table-expressions

    You're going to get a full response to your long post in due course, but I thought you should see that first.

    Paul

  • Good article.

    This is one of the most common culprits I have found of long running procedures and lock issues at a number of various clients for whom I have done work. (Search query takes for ever and prevents write operations from other processes). Typically, I'll recommend checking the parameters and executing the appropriate query with only the arguments that are not null (literally duplicating the query for every combination). In cases where you have 4+ optional parameters, then I generally recommend the Dynamic SQL with Variables approach, as the proc woudl become otherwise unwieldy. Though someone mentioned issues with execution scope where data could not be access directly, the duplicated query with only the specified fields, while wordy and ugly, would provide the performance benefits without this limitation.

    as a consultant, sometimes I like seeing the OR IS NULl approach, as I can quicly re-write it with thousandfold performance improvements and the client loves me for it :-D. Other times, it annoys me because obviously people didn't bother investigating what thier queries were DOING...

    Thanks Paul W for posting info about the RECOMPILE option changes in SP1 CU5. I'll look into that for some cases, though often when I hit this pattern the SQL itself if generally complex search patterns that would suffer from forced recompilation.

  • I thinks it will be well preformance using sp_executesql for dynamic sql query. it can reuse the execure plan in cache, so reduce the compile time.:-)

  • Very nice article. Keep up the good work.

    Very well written and informative.

    My only complain would be the use of hacks like "always true" or "always false".

    The "WHERE 1 = 1" was used nicely though. πŸ™‚

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • Very nice article.

    Some month ago, I tried to do something like that, and I tried to improve the query doing my best but…I found that when you have dynamic queries, so you need to do the classic string concat in order to do it, the best I found was CLR Stored Procedure. I found a big difference into the performance when you query tables with more than 20 millions of rows.

    SQL Server also saves in cache a dynamic query from a CLR Stored procedure, and for a normal dynamic query made in a Stored Procedure it don’t.

    This is my opinion, of course, I am not a SQL Server guru I am just a .Net Developer and I had big performance problems to solve, so I found this way to solve it.

    In my case I use very big tables from a Telecom system.., so the performance is very important.

    Cheers

    Marcos

Viewing 15 posts - 91 through 105 (of 106 total)

You must be logged in to reply to this topic. Login to reply