Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««7891011»»

Use Dynamic SQL to Improve Query Performance Expand / Collapse
Author
Message
Posted Monday, May 24, 2010 7:37 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Today @ 5:56 AM
Points: 338, Visits: 1,432
James Dingle-651585 (5/24/2010)
It is difficult to demonstrate for me now because I am not at the office, the computer I use now does not have SQL... sorry if I take time to get there :)

James, Ok, I'll be interested to see it working. Could you use the test data in my reply here with only @Name set to a value which is searched using '=' and with 'LIKE' for colour and inequalities for the price and dates i.e. logically equivalent to:
SELECT *
FROM [Product]
WHERE [Name] = Coalesce(@Name, [Name])
AND [Color] LIKE Coalesce(@Color, [Color] )
AND [ListPrice] >= Coalesce(@ListPriceMin, [ListPrice])
AND [ListPrice] <= Coalesce(@ListPriceMax, [ListPrice])
AND ModifiedDate >= Coalesce(@ModifiedDateMin, ModifiedDate)
AND ModifiedDate <= Coalesce(@ModifiedDateMax, ModifiedDate)

Post #926830
Posted Monday, May 24, 2010 8:22 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: 2 days ago @ 9:36 PM
Points: 9,927, Visits: 11,189
Jeff Moden (5/23/2010)
Heh... I actually forgot to mark the article before. 5 stars it is.

Thanks Jeff, a bit of balance has been restored to The Force




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #926867
Posted Monday, May 24, 2010 8:26 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: 2 days ago @ 9:36 PM
Points: 9,927, Visits: 11,189
Gianluca Sartori (5/24/2010)
josh shilling (5/21/2010)
See this post by Mark Cohen...

That page contains quite questionable statements. It's a shame it still comes in the first Google page for "sql short-circuit".
I'll try to finish soon my article on short-circuiting and (hopefully) get it published here.

More than questionable, as I tried to show with the code snippet I posted.
The fact is that we might infer that SQL Server does do some short-circuiting in some circumstances (start-up filters perhaps) but it is not documented or guaranteed. Given that much better alternatives exist, it strikes me as being extremely poor practice to rely on this sort of "by-inspection" behaviour. If you need a reviewer or second contributor for your article, send me a PM.

Paul




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #926869
Posted Monday, May 24, 2010 1:33 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: 2 days ago @ 9:36 PM
Points: 9,927, Visits: 11,189
The approach advocated by James is much more suited to simpler requirements based on OR conditions, as his second example shows.

The first submission attempted to apply the technique to a query using ANDed predicates, and turns out to be problematic even when the bugs are removed. Essentially, the idea relies on Start-Up Filters to re-use a generic query plan, but to avoid executing branches that are not needed at run time.

The concept is appealing, and I did play around with this several years ago when 2005 was new. It is perfectly possible to produce an efficient plan when all the predicates are SARGable and linked with OR. This is because the keys contributing to each input of the concatenation will always be required, and the effect is similar to an optimised index intersection.

Things become much more difficult when predicates are non-SARGable (necessitating index scans) or when ANDs are involved. The problem with an AND is that the net result of all the predicates is almost always smaller than any of the individual inputs. If one or more of the (non-NULL) search conditions are not very selective on the target index (or if the optimiser does not have good statistics) the plan produced can be very slow (even with RECOMPILE specified).

In the case presented by Jonathan (the million-row table) it is just about impossible to find a general solution based on start-up filters that performs anywhere near as well as dynamic SQL or the optimal code written for 2008 SP1 CU5 onward with RECOMPILE specified. Some inputs can be made to work reasonably well so long as all the non-NULL conditions are very selective, or if extensive hinting is applied (e.g. FORCE ORDER, LOOP JOIN, RECOMPILE, MERGE UNION...and so on).

The start-up filter idea also struggles with some range conditions, since COALESCE just expands to a non-SARGable CASE statement before optimisation. An index scan might be fine on a small table, but it hurts performance severely on the million-row example. It is possible to make the search conditions on Modified Date and List Price SARGable by using a hard-coded constant with ISNULL to fix the lower and upper bounds of the range (e.g. ListPrice <= ISNULL(@ListPriceMax, 922337203685477.5807)) but this is not very satisfactory.

In summary, I would avoid this method unless your requirements are for simple query conditions joined exclusively by ORs, with no range conditions. Anyone using 2008 SP1 CU5 or later should ignore the method completely and use dynamic SQL or code based on the following model:

-- SQL 2008 SP1 CU5 onward ONLY
SELECT *
FROM [Product]
WHERE (@Name IS NULL OR [Name] LIKE @Name)
AND (@Color IS NULL OR [Color] LIKE @Color)
AND (@ListPriceMin IS NULL OR [ListPrice] >= @ListPriceMin)
AND (@ListPriceMax IS NULL OR [ListPrice] <= @ListPriceMax)
AND (@ModifiedDateMin IS NULL OR ModifiedDate >= @ModifiedDateMin)
AND (@ModifiedDateMax IS NULL OR ModifiedDate <= @ModifiedDateMax)
OPTION (RECOMPILE);

Paul




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #927048
Posted Monday, May 24, 2010 2:58 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, July 15, 2013 6:08 PM
Points: 14, Visits: 73
Things become much more difficult when predicates are non-SARGable (necessitating index scans) or when ANDs are involved. The problem with an AND is that the net result of all the predicates is almost always smaller than any of the individual inputs. If one or more of the (non-NULL) search conditions are not very selective on the target index (or if the optimiser does not have good statistics) the plan produced can be very slow (even with RECOMPILE specified).

Indeed, but in appearance. I did not want to enter the details because I did not want to write an article within the article. But let's step back from the code, and let's try to think how the AND problem should be solved conceptually.
Let's imagine you want to allow a search on the last name AND the first name. You can have an index on the last name only, another one on the first name only, and join the intermediate results. This works only if the selectivity is small in both columns. But if you search for "John Smith", unfortunately we can imagine that the intermediate results will be large on both first and last names, thus turning in a heavy IO query for a very small final set of results.
In that case, the answer is very easy for everybody : you just use a covering index on both columns last and first name, i.e. you add the "first name" column to the "last name" index and that's it.
You would probably blame the developer who does not use a two-columns index in this case. I would equally blame the developer who does not use a two-column index in the way I proposed to write the dynamic query. After all, it's only different ways to type it, but at the end of the story, SQL compiler should end up with a single index seek, with "seek" (and not "predicate") on the two columns right? So whatever method you use, this is the goal to achieve.

So this writing actually reflects this logic.
WHERE
-- Last name only
UserId IN (SELECT UserId FROM Users WHERE LastName = @LastName AND @LastName IS NOT NULL AND @FirstName IS NULL)
-- Last name and first name
OR UserID IN (SELECT UserId FROM Users WHERE LastName = @LastName AND FirstName = @FirstName AND @LastName IS NOT NULL AND @FirstName IS NOT NULL)
-- First name only
OR UserID IN (SELECT UserId FROM Users WHERE FirstName = @FirstName AND @FirstName IS NOT NULL)

The bottom line of the "AND" problem is that you have two types of "AND" conditions : the one you will solve by a covering or multi-column index (or an indexed view if the columns are not in the same table), and the one you will not optimize because the selectivity of each criteria separately is enough and does not worth the price in disk space. For instance, you may decide that it's not worth it to have a covering index on the last name and the email, because the email is selective enough. Again, you will blame the developer who creates such an index.
To say it differently, all the "ANDs" (intersections) have to be applied before the "ORs" (unions). Intersections after unions can turn into very bad performance if sets are large. It seems logical that you have to start with the smallest sets possible from the beginning of the execution plans, i.e. out of SQL indexes.

By the way, in case you were thinking about it, there is a reason why you cannot write a statement like this :
OR UserID IN (SELECT UserID FROM Users WHERE @LastName = LastName AND (@FirstName IS NULL OR FirstName = @FirstName))

In that case, the LastName will be performed with a seek, and the FirstName will be performed with a predicate; again because SQL will have to choose a plan, not knowing the value of the parameters.

It may seems redundant writing, I admit it (you can optimize by using bitwise flags or other techniques if the parameters get too many). But the way to read the total unverse of possibilities, is to list on one hand each case on which you support optimization is actually written down and checking that there is a part of the execution plan that supports it, and on the other hand which queries are an extension of optimized ones, plus basic filtering. My personal experience was that it is way more predictible and easy to have a global view of one stored procedure, while I could never trust dynamic SQL to 100%. When a developer was using dynamic SQL and I asked him what was optimized and what was not, most of the time the answer I get was about the cases he actually tested, but not the total universe of combinations admitted by such a query.

About the RECOMPILE statement, this was not a possibility we could afford in our application. We had to handle something like 4000 batches per second and dealing with recompilation at each call was not thinkable.

The fact is that we might infer that SQL Server does do some short-circuiting in some circumstances (start-up filters perhaps) but it is not documented or guaranteed. Given that much better alternatives exist, it strikes me as being extremely poor practice to rely on this sort of "by-inspection" behaviour.

I don't like the word "short-circuit" because it sounds like a hack, like abusing of some behavior which is not on purpose. I prefer to consider that SQL engine processes scalar logic before set logic, and consider this is a strong design architecture of the execution plan compiler that you can rely your code on, exactly as you will write things like "if (c != null && c.Property = 1)" in C language so you cannot get null pointer exception (while it did not work in VB6). The point of the documentation is relevant though, because it raises the question about knowing for how long this is going to be true.
I do not consider this practice as poor, or I will have difficulties to understand how a poor practice succeeded in putting a database project and system in a more stable, predictable and manageable direction than a good one. As I've said in my first post in this thread, I do not recommend the dynamic SQL statement approach, for the reasons I've mentionned, and based on the experience I had and the objectives our team had to meet, which were basically very high performance and reliability metrics. If some people still wants to use it because in their case it's more convenient or agile, well I think that the weak point of this approach is its lack of predictibility which has to be compensated by very strong testing -- and even though, I do not sleep entirely peacefully. :)
Post #927090
Posted Monday, May 24, 2010 3:56 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: 2 days ago @ 9:36 PM
Points: 9,927, Visits: 11,189
James,

Your increasingly complex nested ORs and ANDs (now in multiple combinations) are looking less and less like the original 'simple' CTE every time you post. I do understand the point you are trying to make, but you still have not provided a testable solution to the challenges laid down so far to illustrate your point, and so risk losing credibility here. I would invite you to solve the problem presented by Jonathan so we can truly compare like with like here.

The wider point you are missing with recompilation (on the SQL Server 2008 version I keep mentioning) is that the optimiser builds a 'throw away' plan taking advantage of the specific parameter values at the execution time. This results in a highly optimised plan, not based on start-up filters, but with the unused plan segments removed entirely.

For example, on the million-row test, the optimiser produces a plan based on a true 3-way index intersection (with seeks on all indexes) and a key lookup on the cluster to fetch the remaining off-index columns: (Very nearly as efficient as the dynamic SQL).



I do disagree with most of your comments, but do not propose to restate facts already presented. The time has come for you to back up your statements with a robust code example - no bugs, no excuses - let's see how it runs for real!

Paul




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi


  Post Attachments 
Plan.gif (220 views, 14.66 KB)
Post #927125
Posted Monday, May 24, 2010 3:58 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: 2 days ago @ 9:36 PM
Points: 9,927, Visits: 11,189
This is the actual code that ran to produce the execution plan in my last post:

DECLARE @Name NVARCHAR(100), 
@Color NVARCHAR(100),
@ListPriceMax MONEY,
@ListPriceMin MONEY,
@ModifiedDateMin DATETIME,
@ModifiedDateMax DATETIME;

SELECT @Name = N'A%',
@Color = N'G%',
@ListPriceMax = $1500,
@ListPriceMin = $500,
@ModifiedDateMin = '20050101',
@ModifiedDateMax = '20060101';

SELECT *
FROM [Product]
WHERE (@Name IS NULL OR [Name] LIKE @Name)
AND (@Color IS NULL OR [Color] LIKE @Color)
AND (@ListPriceMin IS NULL OR [ListPrice] >= @ListPriceMin)
AND (@ListPriceMax IS NULL OR [ListPrice] <= @ListPriceMax)
AND (@ModifiedDateMin IS NULL OR ModifiedDate >= @ModifiedDateMin)
AND (@ModifiedDateMax IS NULL OR ModifiedDate <= @ModifiedDateMax)
OPTION (RECOMPILE);





Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #927126
Posted Tuesday, May 25, 2010 12:29 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, August 27, 2014 1:46 PM
Points: 79, Visits: 1,331
Something that often goes overlooked when using sp_executesql is explicitly using unicode strings. It is good to keep in mind the SQL and ParmDefinition parameters are of nvarchar datatype. In the article nvarchar was used to declare the @sSql, @sWhereClause, @ParmDefinition and @NewLine. But throughout there were non-unicode strings being assigned to them.

So instead of doing this:
   SET @sSql = 'SELECT [Name],' + @NewLine
+ ' [ProductNumber],' + @NewLine
+ ' [ListPrice],' + @NewLine
+ ' [ModifiedDate]' + @NewLine
+ ' FROM [Production].[Product]' + @NewLine
+ ' WHERE 1 = 1' -- Always true

Using a unicode string as follows would be preferred:
   SET @sSql = N'SELECT [Name],' + @NewLine
+ N' [ProductNumber],' + @NewLine
+ N' [ListPrice],' + @NewLine
+ N' [ModifiedDate]' + @NewLine
+ N' FROM [Production].[Product]' + @NewLine
+ N' WHERE 1 = 1' -- Always true

Prepending the "N" will ensure it is a unicode string and avoid the implicit conversion that goes on. As a good practice this should be done where ever the hardcoded strings are assigned to the nvarchar datatypes.
Post #927730
Posted Tuesday, May 25, 2010 3:50 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Today @ 5:56 AM
Points: 338, Visits: 1,432
Adam Gojdas (5/25/2010)
Something that often goes overlooked when using sp_executesql is explicitly using unicode strings. It is good to keep in mind the SQL and ParmDefinition parameters are of nvarchar datatype. In the article nvarchar was used to declare the @sSql, @sWhereClause, @ParmDefinition and @NewLine. But throughout there were non-unicode strings being assigned to them.

So instead of doing this:
   SET @sSql = 'SELECT [Name],' + @NewLine
+ ' [ProductNumber],' + @NewLine
+ ' [ListPrice],' + @NewLine
+ ' [ModifiedDate]' + @NewLine
+ ' FROM [Production].[Product]' + @NewLine
+ ' WHERE 1 = 1' -- Always true

Using a unicode string as follows would be preferred:
   SET @sSql = N'SELECT [Name],' + @NewLine
+ N' [ProductNumber],' + @NewLine
+ N' [ListPrice],' + @NewLine
+ N' [ModifiedDate]' + @NewLine
+ N' FROM [Production].[Product]' + @NewLine
+ N' WHERE 1 = 1' -- Always true

Prepending the "N" will ensure it is a unicode string and avoid the implicit conversion that goes on. As a good practice this should be done where ever the hardcoded strings are assigned to the nvarchar datatypes.


Thanks, there are more details of why here: http://support.microsoft.com/kb/239530
Post #927844
Posted Wednesday, May 26, 2010 11:08 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, July 15, 2013 6:08 PM
Points: 14, Visits: 73
Further to Paul's invitation I have prepared a more elaborated comparison between 3 methods we have talked about in this thread.
I hope that, and the end of my comparison, and assuming you will think I have not missed something important, you will understand my concerns and warnings about dynamic SQL that generates code.

=========================================================================
OBJECT TESTED:
- A user table with 1 million rows and 4 columns: ID, First name, Last name, Email.
- System tested: SQL2008 x64 10.0.2531, Windows 7 x64, Athlon 64 dual core 3800+

GOAL TO ACHIEVE:
- Implement a search within the users by: last name, last name and first name, and email.

METHODS COMPARED:
1) As a point of reference, ad-hoc stored procedures
2) Dynamic generation of the code
3) Dynamic SQL statement using scalar expressions
4) ad-hoc stored procedure with OPTION (RECOMPILE)

SPECIAL:
- A bug is included, of course on purpose. The bug is that methods 2, 3, 4 allow to execute a seach by first name ONLY while there is no index on this column, and this is not supported by the writer of the stored procedure. In this example, the bug is very obvious to notice, just looking at the code. I ask you to admit that in real conditions, in a code that is 2 or 3 pages long, the bug would be less obvious, and can reach production without having been detected.

WHAT I WOULD LIKE TO EVALUATE:
- I would like to show how the method 2 can lead to nasty situations when the bug is met, and how can a query and an execution plan that have never been reviewed get executed on production.
- I would like to show how the method 3 prevents, when the same bug is met, that a unexpected execution plan is run, and thus how this methods protects the total application from unexpected use of its stored procedures.
- I would like to show how method 3 allows the writer to review ALL possible code paths while method 2 can't.
- I would like to show that method 4 is actually slower so it's not a solution if you have a lot of activity on the machine and/or if response time is important.
- The post is not about performance, the post is about predictability, how robust is your application once shipped. Though it is important to compare performances because, even if a method is better on a predictibility and stability point of view, it's not really usable if it does not perform well.
- I WILL NOT discuss about execution plans and how to get to a satisfying one with each method used. I assume that we reached this point so we can compare them. I think I wrote the code impartially.
- General comments that have happened will be added as Q&A at the end.


==========================================================================
TEST SETUP

Run DynamicQueries_Databases.txt to initialize the database
Set NoCount On;

-- Create the database

Use master;
Go

If Exists (Select * From sys.databases Where name = 'DynamicQueries')
Drop Database DynamicQueries;

Create Database DynamicQueries;
Go

Use DynamicQueries;
Go

-- Create the table

Create Table [User]
(
UserId Int Not Null Identity (1, 1) Not For Replication,
LastName VarChar(50) Not Null,
FirstName VarChar(50) Not Null,
Email VarChar(255) Not Null,

Constraint PK_User Primary Key Clustered (UserId)
);

Create Index IX_Name On [User] (LastName, FirstName);
Go

Create Index IX_Email On [User] (Email);
Go

-- Fill with data
Declare @I Int;
Set @I = 1000000/50000;
While @I > 0
Begin
RaisError('%d filling iterations remaining...', 10, 1, @I) With NoWait;
Insert Into [User] (LastName, FirstName, Email)
Select Top 50000
NEWID(), NEWID(), NEWID()
From sys.messages
Set @I = @I - 1;
End

-- Rebuild so the indexes are most efficient
Alter Table [User] Rebuild;
Alter Index IX_Email On [User] Rebuild;
Alter Index IX_Name On [User] Rebuild;

Run DynamicQueries_Databases to create the stored procedures
Use DynamicQueries;
Go

-- Case 1 : static ad-hoc queries, to serve as reference
If OBJECT_ID('Case1GetUserByLastName', 'P') Is Not Null Drop Procedure Case1GetUserByLastName;
Go
Create Procedure Case1GetUserByLastName (
@LastName VarChar(50)
)
As
RaisError('Case1GetUserByLastName', 10, 1) With NoWait;

Select *
From [User]
Where LastName = @LastName
Go

If OBJECT_ID('Case1GetUserByLastAndFirstNames', 'P') Is Not Null Drop Procedure Case1GetUserByLastAndFirstNames;
Go
Create Procedure Case1GetUserByLastAndFirstNames (
@LastName VarChar(50),
@FirstName VarChar(50)
)
As
RaisError('Case1GetUserByLastAndFirstNames', 10, 1) With NoWait;

Select *
From [User]
Where LastName = @LastName
And FirstName = @FirstName
Go

If OBJECT_ID('Case1GetUserByEmail', 'P') Is Not Null Drop Procedure Case1GetUserByEmail;
Go
Create Procedure Case1GetUserByEmail (
@Email VarChar(255)
)
As
RaisError('Case1GetUserByEmail', 10, 1) With NoWait;

Select *
From [User]
Where Email = @Email;
Go

-- Case 2 : dynamic query using generated code

If OBJECT_ID('Case2GetUser', 'P') Is Not Null Drop Procedure Case2GetUser;
Go
Create Procedure Case2GetUser (
@LastName VarChar(50) = Null,
@FirstName VarChar(50) = Null,
@Email VarChar(255) = Null,
@Debug Bit = 0
)
As
RaisError('Case2GetUser', 10, 1) With NoWait;

Declare @Sql NVarChar(4000);

Set @Sql = 'SELECT * FROM [User] WHERE 1 = 1 '

If @LastName Is Not Null
Set @Sql = @Sql + ' AND LastName = @LastName '
If @FirstName Is Not Null
Set @Sql = @Sql + ' AND FirstName = @FirstName '
If @Email Is Not Null
Set @Sql = @Sql + ' AND Email = @Email '

If @Debug = 1
Print @Sql;

Exec sp_executesql @Sql, N'@LastName VarChar(50), @FirstName VarChar(50), @Email VarChar(50)',
@LastName = @LastName,
@FirstName = @FirstName,
@Email = @Email;

Go

-- Case 3 : dynamic query using scalar expressions

If OBJECT_ID('Case3GetUser', 'P') Is Not Null Drop Procedure Case3GetUser;
Go
Create Procedure Case3GetUser (
@LastName VarChar(50) = Null,
@FirstName VarChar(50) = Null,
@Email VarChar(255) = Null
)
As
RaisError('Case3GetUser', 10, 1) With NoWait;

Select *
From [User]
Where
(
-- Last name only (omitted but implicit: "And @LastName Is Not Null")
UserId In (Select UserId From [User] Where @LastName = LastName And @FirstName Is Null)
-- Last name and first name together (omitted but implicit "And LastName Is Not Null And @FirstName Is Not Null")
Or UserId In (Select UserId From [User] Where @LastName = LastName And @FirstName = FirstName)
-- by email (omitted but implicit "And @Email Is Not Null")
Or UserId In (Select UserId From [User] Where @Email = Email)
)
Go

-- Case 4 : just-in-time compilation

If OBJECT_ID('Case4GetUser', 'P') Is Not Null Drop Procedure Case4GetUser;
Go
Create Procedure Case4GetUser (
@LastName VarChar(50) = Null,
@FirstName VarChar(50) = Null,
@Email VarChar(255) = Null
)
As
RaisError('Case4GetUser', 10, 1) With NoWait;

Select *
From [User]
Where
(
(
@LastName = LastName
And
(
@FirstName = FirstName
Or @FirstName Is Null
)
)
Or
(
@Email = Email
)
)
Option (Recompile)
Go

==========================================================================
TEST RUN

A. Showing how generation of code on the fly can be dangerous

1. Open DynamicQueries_BVTs.txt
-- Run this query to initialize the variables below
-- Select Top 1 LastName, FirstName, Email From [User]

-- Activate SQLCMD mode!
:setvar LASTNAME "4A7801E6-B295-46BD-953F-9FAC90E9B5E4"
:setvar FIRSTNAME "2C4C3500-1ADC-46E0-8A7C-80008B099A28"
:setvar EMAIL "30A7195A-2E88-49B5-95DC-818E08A893F0"

Set NoCount On;
Set Language us_english;
Set Statistics IO On;

Declare @LastName VarChar(50) = NewId();
Go

Print '
BVTs
'

-- BVT case 1
Exec Case1GetUserByLastName @LastName = '$(LASTNAME)';
Go
Exec Case1GetUserByLastAndFirstNames @LastName = '$(LASTNAME)', @FirstName = '$(FIRSTNAME)';
Go
Exec Case1GetUserByEmail @Email = '$(EMAIL)';
Go

-- BVT case 2
Exec Case2GetUser @LastName = '$(LASTNAME)';
Go
Exec Case2GetUser @LastName = '$(LASTNAME)', @FirstName = '$(FIRSTNAME)';
Go
Exec Case2GetUser @Email = '$(EMAIL)';
Go

-- BVT case 3
Exec Case3GetUser @LastName = '$(LASTNAME)';
Go
Exec Case3GetUser @LastName = '$(LASTNAME)', @FirstName = '$(FIRSTNAME)';
Go
Exec Case3GetUser @Email = '$(EMAIL)';
Go

-- BVT case 4
Exec Case4GetUser @LastName = '$(LASTNAME)';
Go
Exec Case4GetUser @LastName = '$(LASTNAME)', @FirstName = '$(FIRSTNAME)';
Go
Exec Case4GetUser @Email = '$(EMAIL)';
Go

-- Test unforeseen situation
Print '
Trick me
';
Go
Exec Case2GetUser @FirstName = '$(FIRSTNAME)';
Go
Exec Case3GetUser @FirstName = '$(FIRSTNAME)';
Go
Exec Case4GetUser @FirstName = '$(FIRSTNAME)';
Go

Set Statistics IO Off;

2. Activate SQLCMD mode if not already
3. Execute the first query in second line comment to get values from the table, replace mine in the ":setvar" statements.
4. Depending if you want to see them or not, activate execution plans.
5. Run the query

Looking at the execution plans, we observe that:
1. Method 2 have a unique execution plan per combination of parameters, while methods 3 and 4 always have the same one.
2. Because methods 3 & 4 have each only ONE execution plan, it's easy to review. For method 2, you have to test all the combinations of parameters and look at each execution plan generated.
3. The only terrible SQL statement and execution plan come from method 2. It is an illustration of an unexpected code path leading to the execution of something that has not been reviewed nor predicted.


Looking at the I/O statistics you will observe that:
1. All methods give the same performance when running in the use case they were designed for.
3. In the bogus situation, method 2 is the only one to turn into a full scan, while others do not put the server in danger.



B. Showing that OPTION RECOMPILE has a cost

1. Open the Windows performance monitor (perfmon.msc)
Option Explicit

Dim con, com, constr
Dim I

constr = "Provider=SQLNCLI;Server=localhost\sql2008;Initial Catalog=DynamicQueries;Integrated Security=SSPI;"
Set con = CreateObject("ADODB.Connection")
Set com = CreateObject("ADODB.Command")

con.Open constr
Set com.ActiveConnection = con

'=================================================
'WITH OPTION RECOMPILE (CASE 4), NOT PREPARED

con.Execute "DBCC FREEPROCCACHE"
WScript.Sleep 5000
com.CommandText = "Declare @Id VarChar(50); Set @Id = NewId(); Exec Case4GetUser @LastName = @Id;"

For I = 1 To 3000
com.Execute
Next

'=================================================
'WITH OPTION RECOMPILE (CASE 4), PREPARED

con.Execute "DBCC FREEPROCCACHE"
WScript.Sleep 5000

com.CommandText = "Declare @Id VarChar(50); Set @Id = NewId(); Exec Case4GetUser @LastName = @Id;"
com.Prepared = True

For I = 1 To 3000
com.Execute
Next

'=================================================
'WITHOUT OPTION RECOMPILE (CASE 3), NOT PREPARED

con.Execute "DBCC FREEPROCCACHE"
WScript.Sleep 5000

com.CommandText = "Declare @Id VarChar(50); Set @Id = NewId(); Exec Case3GetUser @LastName = @Id;"

For I = 1 To 3000
com.Execute
Next

con.Execute "DBCC FREEPROCCACHE"

'=================================================
'REFERENCE (CASE 1), NOT PREPARED

con.Execute "DBCC FREEPROCCACHE"
WScript.Sleep 5000

com.CommandText = "Declare @Id VarChar(50); Set @Id = NewId(); Exec Case1GetUserByLastName @LastName = @Id;"

For I = 1 To 3000
com.Execute
Next

' Clean so Cache Hit Ratio graph falls to zero
con.Execute "DBCC FREEPROCCACHE"

con.Close

2. Remove all counters.
3. Add counters "SQL Statistics::SQL Compilations/sec", "SQL Statistics::SQL Re-compilations/sec", "Plan Cache::Cache hit ratio", "Batch Requests/sec". Put "BatchRequests/sec" bolder and in the background, in a different color than "compilations/sec".
4. Put all the compilation scales to 1, the cache hit ratio to 10, and the maximum value of the vertical ladder to 1000. You may have to adjust the output depending on the performance of your machine.
5. Open a command line window, and run "cscript TestDynamicQueries.vbs"
6. The script runs 3000 executions of the following use cases:
- Method 4 without prepare
- Method 4 with prepare
- Method 3 without prepare
- Method 1 without prepare



Reading the graph it is easy to realize that:
1. OPTION RECOMPILE really HURTS!
2. Using "Prepare = True" solves the compilation problem of the SQL command in the VBS script, and thus enables plan cache to work. We jump from two to one compilation per execution.
3. But still, there is a compilation per call and this makes the 3000 executions to last way longer.


==========================================================================
Q&A

1. I use method 2, and it is not possible I ever have any problem with unexpected combination of parameters, because I totally control how the stored procedure is executed. So I can't see the point of changing to a would-be "more predictable" method.
Well, happy for you. My experience with method 2 was different: the stored procedures were exposed through APIs to other teams, and in some cases, to external customers or users eventually. It was not rare to see stored procedure with 20 optional parameters that could all be independently set, or not. The database team did not have any control of how they were executed, and we frequently faced outages because of such unexpected conditions. It was very difficult to explain that SQL server has its own logic we can't predict before testing, and this was not playing for a good reputation of the database team.

2. I tried to use method 3, but my execution plans were poor. Full scans, or big arrows, I can't achieve something clean.
Most probably, you have not tried hard enough. I usually noted two paths of improvements when facing such a problem:
a) Jumping directly in the code without forethinking how this query should be solved, what indexes will trigger the best reduction in early results, when applying filter condition (usually happens with big arrows).
b) SQL compiler sees a potential combination you have not thought about and makes the plan ready to execute it. Search for it, decide if you want to support it or disable it using a scalar expression. Usually happens when you see big scans in the execution plan you can't get rid of.

3. SQL Server team have never said their engine is designed to understand intents within method-3-like code. As a result, one day they may issue without warning a patch/release that makes this method not working anymore.
This is indeed the weak point of this approach. I hope to have a response from a member of SQL Server team on this subject one day. However, I have strong reasons to believe this is not going to change overnight. First, they document how execution plans are cached and reused. Second, they have to ensure backward compatibility. But with what? It is well know for instance that if we put NEWID() in a SELECT statement, it will be executed once per row. Other functions and expressions such as GetUtcDate() won't. This shows how the functions and expressions are evaluated, and how the mess will be big if they break this behavior. So the day when method 3 won't work, a ton of code won't behave the same as well. SQL team can't make expressions evaluated once per query to suddenly be evaluated once per row.

4. I don't get the point of what is method 3. In some post a CTE is used, in another "OR" keywords... in what it is the same?
Method 3 is not about using X or Y keyword. You can use any instruction in the SQL dictionary. The point of method 3 is to use scalar expressions to dynamically enable or inhibit portions of the same SQL statement, while having as many times the table you want to access in your statement as ways to access it. So if you plan for instance to have 3 ways to access the table MyTable on indexes IndexA, IndexB and ClustIndex, you will find at least three times "MyTable" in the code and the output of compilation, while at execution time only one operation is actually executed. When SQL is running the query (so, after compilation), pure scalar expressions such as "GetDate(), A=B, C>1+2" are evaluated before any object (table or index) is opened. Other expressions referring to columns (LastName Like 'A%') are evaluated for every row if they have not been transform into a SEEK operation, i.e. if SQL is not using an index to get the rows that match them.
A expression evaluated as FALSE inhibits all the work bound to it. As a very simple exemple is to look at the execution plan of "SELECT * FROM [User] WHERE 1=0".

==========================================================================
CONCLUSION

I wanted to show that concatenating SQL strings is not the only way to get dynamic execution, there are others which are also viable in terms of stability, predictability and performance.
I also wanted to show that you must be very careful when using concatenation of SQL strings, because you and your testers may not think about all the code paths your code allows; and each one of them produces a different execution plan. Depending in how your application is used and by who, those execution plan can be activated on production by a new combination of parameters without being reviewed or predicted by the database team.

From my own team experience, we made terrific progress in stability and predictible performance when we put as a principle that *all* execution plans and combinations of variables have to be reviewed, even when the stored procedures allowed many input parameters. The application was using dynamically generated SQL code and it was a very difficult goal to achieve because it would have mean to test a lot of different combinations.
When we switched to adopt only one execution plan per stored procedure, we could adopt a methodology which we can classify the 2 or 3 use cases we really want to optimize best, the others use cases for which moderate performance is acceptable, and most important, that there is no "leak" or possibility to trigger a very very bad execution plan leading to immediate outages.


  Post Attachments 
OptionRecompile.jpeg (112 views, 103.43 KB)
InCaseOfBug.jpeg (113 views, 39.70 KB)
DynamicQueries_Databases.txt (3 views, 1.04 KB)
DynamicQueries_Procedures.txt (2 views, 3.15 KB)
DynamicQueries_BVTs.txt (2 views, 1.45 KB)
Post #928468
« Prev Topic | Next Topic »

Add to briefcase «««7891011»»

Permissions Expand / Collapse