Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««7891011»»»

Use Dynamic SQL to Improve Query Performance Expand / Collapse
Author
Message
Posted Saturday, May 22, 2010 10:32 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, July 15, 2013 6:08 PM
Points: 14, Visits: 73
I have experimented the approach above for a couple of years in a production system.
While it seems to simplify development on a very first time, personnally now I do not recommend the use of dynamic SQL in heavy OLTP environments, for those reasons :
1) The execution plans are not easily predictible and there is a lot of possible code paths. The developers will usually work with 2 or 3 use cases, but not a hundred. It's easy to miss a combination of parameters that is very slow, contains some index scan, or whatever you haven't thought about. This lack of predictability have to be compensated by a lot of tests or by a very close survey of the production machines. But it's a shared resource, most of the time. Even a single slow running corner case combination of parameters executed once per 5 second can choke the whole system.
2) Considering an application has to be stable over time and modifications, the consequence of the problem above is that it gets worse over time, taking in account developers turnover. The natural future of this application is to have so many "IFs" that you do not really know which one is used or not.
3) Since there is some SQL within quotes, there is always a probability that, over time and modifications, it does not compile in some corner cases you have not seen in tests.
4) It gets complicated to have a good index structure to handle all those use cases. The tendency is then to have too many or too large indexes.
5) It should not be a reason, but I have seen SQL servers on production crashing because they were not able to compile a execution plan in some corner case of such dynamic queries. I have opened cases to Microsoft SQL customer support for those.
6) The only reason why people use it is that they were unsuccessful with the former approach, but without really understanding why the SQL compiler was not giving what they wanted. But if they really understood what was problematic, I bet they would like to solve it in a more stable way.

Let's take the original query as an example

SELECT [Name], [Color], [ListPrice], [ModifiedDate]
FROM [Production].[Product]
WHERE [Name] LIKE coalesce(@Name, [Name])
AND Colour] LIKE coalesce(@Color, [Colour] )
AND [ListPrice] >= coalesce(@ListPriceMin, [ListPrice])
AND [ListPrice] <= coalesce(@ListPriceMax, [ListPrice])
AND ModifiedDate >= coalesce(@ModifiedDateMin, ModifiedDate)
AND ModifiedDate <= coalesce(@ModifiedDateMax, ModifiedDate)

If you look at the execution plan, you will see a clustered index scan on this table, and you will be a little puzzled about why. The reason is that SQL can't choose an index because one optimized query for one criteria will perform very poorly with another criteria. In addition to this, in this example, the COALESCE function forces a computation for every row, thus impeaching the use of indexes. As a SQL developer, you have to understand all this.

You can help SQL in choosing its indexes by just adding the CTE below:
WITH Filtering As
(
SELECT ProductId
FROM [Production].[Product]
WHERE @Name IS NOT NULL AND @Name = Name
UNION ALL
SELECT ProductId
FROM [Production].[Product]
WHERE @Colour IS NOT NULL AND @Colour = Colour
UNION ALL
SELECT ProductId
FROM [Production].[Product]
WHERE NOT (@ListPriceMin IS NULL AND @ListPriceMax IS NULL)
AND (@ListPriceMin IS NOT NULL OR ListPrice >= @ListPriceMin)
AND (@ListPriceMax IS NOT NULL OR ListPrice <= @ListPriceMax)
-- ...
)
SELECT [Name], [Color], [ListPrice], [ModifiedDate]
FROM [Production].[Product]
WHERE ProductID In (SELECT ProductID From Filtering)
AND [Name] LIKE coalesce(@Name, [Name])
AND [Colour] LIKE coalesce(@Color, [Colour] )
AND [ListPrice] >= coalesce(@ListPriceMin, [ListPrice])
AND [ListPrice] <= coalesce(@ListPriceMax, [ListPrice])
AND ModifiedDate >= coalesce(@ModifiedDateMin, ModifiedDate)
AND ModifiedDate <= coalesce(@ModifiedDateMax, ModifiedDate)

As an outcome you have all the possible combinations of indexes in one single execution plan, and then it is way more convenient to optimize which indexes you want, do not want, are still useful, and so on.
On other environments on which the compilation duration is not really a problem, and not exposed to users directly, I advise to keep the former statement and use RECOMPILE option, this is usually more accessible to developers - though it forces DBAs to look at slow queries more often.

The key tricks to understand here are:
- Having one statement per optimized case (in the CTE, separated by UNION ALL) allows SQL to choose the proper index for each one. A simple look at the execution plan is enough to realize it.
- It's important to have "WHERE @Name IS NOT NULL" statements in the Filtering CTE, because if it happens that @Name IS NULL, then SQL will not perform any read on this part of the union (scalar expression are evaluated before any fetching). So even if the execution plan looks bigger, actually the # of reads will remain economic. It is written as a static query, but it is executed as a dynamic one.

Having experimented it for two years in a team, there was a serious improvement in quality of deliverables from developers. After some training, most of the time the first shot was working properly enough to get on production.
Post #926437
Posted Sunday, May 23, 2010 2:09 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Thursday, October 16, 2014 3:45 AM
Points: 338, Visits: 1,431
James Dingle,
I've tried your style of query to obtain the execution plan and it is the worst performing out of all the attempts so far.
Please find details below. I created a table with 1 million rows to test it:

Create test data:
IF OBJECT_ID('Product','U') IS NOT NULL
DROP TABLE [Product]
IF OBJECT_ID('Colors','U') IS NOT NULL
DROP TABLE [Colors]

CREATE TABLE [Product]
(
[ProductId] int IDENTITY(1,1) NOT NULL,
[Name] nvarchar(100) NOT NULL,
[Color] nvarchar(100) NOT NULL,
[ListPrice] money NOT NULL,
[ModifiedDate] datetime NOT NULL
CONSTRAINT [PK_Product_ProductID] PRIMARY KEY CLUSTERED ([ProductID] ASC)
)

CREATE TABLE Colors(Color nvarchar(100))
INSERT INTO Colors(Color) VALUES
('Amber'),('Beige'),('Black'),('Blue'),('Brown'),('Dark blue'),('Dark brown'),
('Dark coral'),('Gold'),('Green'),('Indigo'),('Ivory'),('Light blue'),
('Light brown'),('Pale blue'),('Pale brown'),('Pastel blue'),('Pastel brown'),
('Pastel green'),('Red'),('Rose'),('White'),('Yellow')


INSERT INTO Product
(
Name,
Color,
ListPrice,
ModifiedDate
)
SELECT TOP 1000000
A.name + '-' + CAST( ROW_NUMBER() over(order by NEWID()) as nvarchar(100)),
B.Color,
ABS(CHECKSUM(NEWID())) % 5000,
Cast('2000-01-01' as datetime) + ABS(CHECKSUM(NEWID())) % 3000
FROM master.sys.columns A
CROSS JOIN Colors B
CROSS JOIN master.sys.columns C

CREATE NONCLUSTERED INDEX Product_X1 ON Product(Name)
CREATE NONCLUSTERED INDEX Product_X2 ON Product(ModifiedDate)
CREATE NONCLUSTERED INDEX Product_X3 ON Product(Color)
GO

Run the queries (Your's are Query 7 & 8)
DECLARE @Name nvarchar(100)
SET @Name = 'Hello'
-- Query 1
SELECT *
FROM [Product]
WHERE (@Name IS NULL OR [Name] = @Name)

-- Query 2
SELECT *
FROM [Product]
WHERE ([Name] = @Name OR @Name IS NULL)

-- Query 3
SELECT *
FROM [Product]
WHERE [Name] = Coalesce(@Name, [Name])

-- Query 4
SELECT *
FROM [Product]
WHERE [Name] = CASE WHEN @Name IS NULL THEN [Name] ELSE @Name END

-- Query 5
SELECT *
FROM [Product]
WHERE 1 = (CASE WHEN @Name IS NULL THEN 1
WHEN @Name IS NOT NULL AND [Name] = @Name THEN 1
ELSE 0
END)

-- Query 6
SELECT *
FROM [Product]
WHERE [Name] = ISNULL(@Name, [Name])

DECLARE @Color nvarchar(100),
@ListPriceMax money,
@ListPriceMin money,
@ModifiedDateMin datetime,
@ModifiedDateMax datetime

SET @Color = NULL
SET @ListPriceMax = NULL
SET @ListPriceMin = NULL
SET @ModifiedDateMin = NULL
SET @ModifiedDateMax = NULL

-- Query 7
;
WITH Filtering AS
(
SELECT ProductId
FROM [Product]
WHERE @Name IS NOT NULL AND @Name = Name
UNION ALL
SELECT ProductId
FROM [Product]
WHERE @Color IS NOT NULL AND @Color = Color
UNION ALL
SELECT ProductId
FROM [Product]
WHERE NOT (@ListPriceMin IS NULL AND @ListPriceMax IS NULL)
AND (@ListPriceMin IS NOT NULL OR ListPrice >= @ListPriceMin)
AND (@ListPriceMax IS NOT NULL OR ListPrice <= @ListPriceMax)
)
SELECT *
FROM [Product]
WHERE ProductID IN (SELECT ProductID
FROM Filtering)
AND [Name] = Coalesce(@Name, [Name])
AND [Color] LIKE Coalesce(@Color, [Color] )
AND [ListPrice] >= Coalesce(@ListPriceMin, [ListPrice])
AND [ListPrice] <= Coalesce(@ListPriceMax, [ListPrice])
AND ModifiedDate >= Coalesce(@ModifiedDateMin, ModifiedDate)
AND ModifiedDate <= Coalesce(@ModifiedDateMax, ModifiedDate)

-- Query 8
;
WITH Filtering AS
(
SELECT ProductId
FROM [Product]
WHERE @Name IS NOT NULL AND @Name = Name

)
SELECT *
FROM [Product]
WHERE ProductID IN (SELECT ProductID
FROM Filtering)
AND [Name] = Coalesce(@Name, [Name])


-- Query 9
SELECT *
FROM [Product]
WHERE [Name] = @Name

And here is a picture of all the execution plans:



Your query without any other columns included in it (Query 8) is as good as the dynamic sql but as soon as some of the other columns are added (Query 7) it becomes the worst performing out of all of them. The dynamic sql (Query 9) would only include WHERE [Name] = @Name if the other columns are null but your's would have all the columns listed as it is not dynamic.


  Post Attachments 
ExecutionPlan1.gif (224 views, 50.90 KB)
ExecutionPlan2.gif (223 views, 43.95 KB)
Post #926555
Posted Sunday, May 23, 2010 4:02 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Thursday, October 16, 2014 3:45 AM
Points: 338, Visits: 1,431
James Dingle,
In fact your query preforms nearly twice as badly as the original non-dynamic SQL that would have been in the initial stored procedure.
See SQL and execution plan below.

DECLARE @Name nvarchar(100),
@Color nvarchar(100),
@ListPriceMax money,
@ListPriceMin money,
@ModifiedDateMin datetime,
@ModifiedDateMax datetime

SET @Name = 'Hello'
SET @Color = NULL
SET @ListPriceMax = NULL
SET @ListPriceMin = NULL
SET @ModifiedDateMin = NULL
SET @ModifiedDateMax = NULL

--Query 1
SELECT *
FROM [Product]
WHERE [Name] = Coalesce(@Name, [Name])
AND [Color] LIKE Coalesce(@Color, [Color] )
AND [ListPrice] >= Coalesce(@ListPriceMin, [ListPrice])
AND [ListPrice] <= Coalesce(@ListPriceMax, [ListPrice])
AND ModifiedDate >= Coalesce(@ModifiedDateMin, ModifiedDate)
AND ModifiedDate <= Coalesce(@ModifiedDateMax, ModifiedDate)

--Query 2
;
WITH Filtering AS
(
SELECT ProductId
FROM [Product]
WHERE @Name IS NOT NULL AND @Name = Name
UNION ALL
SELECT ProductId
FROM [Product]
WHERE @Color IS NOT NULL AND @Color = Color
UNION ALL
SELECT ProductId
FROM [Product]
WHERE NOT (@ListPriceMin IS NULL AND @ListPriceMax IS NULL)
AND (@ListPriceMin IS NOT NULL OR ListPrice >= @ListPriceMin)
AND (@ListPriceMax IS NOT NULL OR ListPrice <= @ListPriceMax)
)
SELECT *
FROM [Product]
WHERE ProductID IN (SELECT ProductID
FROM Filtering)
AND [Name] = Coalesce(@Name, [Name])
AND [Color] LIKE Coalesce(@Color, [Color] )
AND [ListPrice] >= Coalesce(@ListPriceMin, [ListPrice])
AND [ListPrice] <= Coalesce(@ListPriceMax, [ListPrice])
AND ModifiedDate >= Coalesce(@ModifiedDateMin, ModifiedDate)
AND ModifiedDate <= Coalesce(@ModifiedDateMax, ModifiedDate)

Execution plans:


Query 1 uses 35% of total cost and query 2 uses 65% of total cost.


  Post Attachments 
ExecutionPlan3.gif (239 views, 34.29 KB)
Post #926564
Posted Sunday, May 23, 2010 4:29 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:08 AM
Points: 35,366, Visits: 31,906
Paul White NZ (5/21/2010)
clementhuge (5/21/2010)
I feel the article is actually pretty poor. All the dbas know this kind of dynamic sql. Someone suggested recompile and with (optimize) option and he is spot on! I was able to improve performance on parameterized query with the optimize option.

Many DBAs *think* they know - big difference.

The best you can do with any form of static code is recompile every time, and even that only works satisfactorily in SQL Server 2008 SP1 with at least Cumulative Update 5 applied, and with correct coding. Even so, recompiling on every execution (and not caching any plan!) is not always a good 'fix'.

If I could, I'd go back and give this article 5 stars now, just to make up for those that think they know better.

Paul


Heh... I actually forgot to mark the article before. 5 stars it is.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #926570
Posted Sunday, May 23, 2010 6:15 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Thursday, October 16, 2014 3:45 AM
Points: 338, Visits: 1,431
And just one more before anyone else suggests it:

-- Query 10
SELECT *
FROM [Product]
WHERE NOT([Name] <> @Name AND @Name IS NOT NULL)


the where clause above is actually logically identical to:
WHERE ([Name] = @Name OR @Name IS NULL)


Gives the same full table scan as the other queries.


  Post Attachments 
ExecutionPlan4.gif (210 views, 6.42 KB)
Post #926580
Posted Monday, May 24, 2010 2:08 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 10:37 AM
Points: 4,421, Visits: 10,738
josh shilling (5/21/2010)
Try flipping the

(Field LIKE @Param OR @Param IS NULL)

To

(@Param IS NULL OR FIELD Like @Param)

See this post by Mark Cohen...

http://beingmarkcohen.com/?p=62





That page contains quite questionable statements. It's a shame it still comes in the first Google page for "sql short-circuit".
I'll try to finish soon my article on short-circuiting and (hopefully) get it published here.


--
Gianluca Sartori

How to post T-SQL questions
spaghettidba.com
@spaghettidba
Post #926654
Posted Monday, May 24, 2010 2:51 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, July 15, 2013 6:08 PM
Points: 14, Visits: 73
Jonathan AC Roberts (5/23/2010)
James Dingle,
In fact your query preforms nearly twice as badly as the original non-dynamic SQL that would have been in the initial stored procedure.

[...]

See SQL and execution plan below.
Execution plans:


Query 1 uses 35% of total cost and query 2 uses 65% of total cost.


Hi Jon,

Quick remarks:
- There is a clustered index scan on the ID. Obviously, this way it won't work. The snippet of code I've wrote was assuming that ProductID is the clustered unique index (usually the PK) of the table.
- You have to add the indexes on the columns on which the filtering is done. In the execution plan above, obviously there is no index on "Name" and so on. So it is not possible to figure out how and why it works.
- Don't trust too much percentages to define a query to be good or bad. Trust CPU time and # of reads. I never knew what those percentages were about exactly and by experience found that they help to focus attention on bottlenecks. But what really matters is bottom line CPU and IO.
- My point was not really about performance, but more about what dynamic queries based on concatenated SQL gives in a real environment, within the lifetime of a project. I used it once, and with the experience of it and others techniques, I went away from dynamic SQL. As usal, in database world, "it depends". So if some people fell comfortable with this approach, let them use it. My point was to show that it is possible to rely on scalar variable resolution to make some part of the query to be executed or not, and thus triggering dynamic execution without putting some code within quotes.
Post #926670
Posted Monday, May 24, 2010 4:26 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Thursday, October 16, 2014 3:45 AM
Points: 338, Visits: 1,431
James Dingle-651585 (5/24/2010)

Hi Jon,

Quick remarks:
- There is a clustered index scan on the ID. Obviously, this way it won't work. The snippet of code I've wrote was assuming that ProductID is the clustered unique index (usually the PK) of the table.
ProductID is the clustered PK, see the script to populate the table.


- You have to add the indexes on the columns on which the filtering is done. In the execution plan above, obviously there is no index on "Name" and so on. So it is not possible to figure out how and why it works.
There is an index on Name, see the table population script.


- Don't trust too much percentages to define a query to be good or bad. Trust CPU time and # of reads. I never knew what those percentages were about exactly and by experience found that they help to focus attention on bottlenecks. But what really matters is bottom line CPU and IO.
I agree, you can't trust these percentages too much.


- My point was not really about performance, but more about what dynamic queries based on concatenated SQL gives in a real environment, within the lifetime of a project. I used it once, and with the experience of it and others techniques, I went away from dynamic SQL.

I think there are other possible ways to allow the need for maintenance of dynamic SQL to be nearly as easy to detect as it is on static procedures, maybe by listing all the tables and columns used in the dynamic sql in a query at the end of the procedure that would never be excecuted so the procedure would no longer compile if any of the columns were changed, I started just by adding a non-executing select of the table name so it would be detected by sp_depends if you wanted to check which stored procedures would need changing if a table were changed.


My point was to show that it is possible to rely on scalar variable resolution to make some part of the query to be executed or not, and thus triggering dynamic execution without putting some code within quotes.

I'm not sure that you have shown that also, to me, it seems a more complicated method than using dynamic sql.
Post #926706
Posted Monday, May 24, 2010 4:51 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, July 15, 2013 6:08 PM
Points: 14, Visits: 73
It is difficult to demonstrate for me now because I am not at the office, the computer I use now does not have SQL... sorry if I take time to get there :)

Though this is what the final EP looks like, with such a query. Note that the syntax is way different from the CTE, but the principle is the same.

Select *
From [User] u
Where
(
Id In (Select Id From [User] Where @FirstName = FirstName And @FirstName Is Not Null)
Or Id In (Select Id From [User] Where @LastName = LastName And @LastName Is Not Null)
Or Id In (Select Id From [User] Where @Email = Email And @Email Is Not Null)
Or Id = @Id
)

Actually in this very specific example, the conditions "And @FirstName Is Not Null" are optional because nothing can be equal to Null. But I encourage developers to write it down anyway so the dynamic triggering condition is explicit and though more readable.

This is the execution plan:

Actually the execution plan will differ from the implementations (merge join, loop joins, etc). The principle is to have no big scan remaining which is not on purpose.

At least, if we still can discuss about the ways to get there :), it shows the interest of having everything in one single eyeshot. It shows all the possible paths of execution SQL can take, keeping in mind that because of scalar resolution most of the time only one path will be executed, while there will be only one EP in the cache plan to handle all possibilities.
This is helpful when driving a project, because it is more simple for the developer, the DBA, and the team leader to review the query entirely, look at what has been optimized with indexes (maybe we don't want to optimize everything for disk space reasons), and pinpoint the nasty corner cases ("ugh! did you see you have an index scan when no parameter is given? Is this what you want?"). The predictability is immediate, while I found that using dynamic concatenated SQL statements forces to review each one of them, and the number of possible combinations makes that one day, some use case is missed and performs badly on production.


  Post Attachments 
DynamicQuery.jpeg (192 views, 60.21 KB)
Post #926722
Posted Monday, May 24, 2010 5:17 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, July 15, 2013 6:08 PM
Points: 14, Visits: 73
By the way Jon, I think I understand why you have scans everywhere while trying the method in my post above: it's a bug in my code. Indeed, there is still the possibility that all of the parameters are NULL, and SQL compiler takes it in account (in that case, the "WHERE ProductID IN ()" subquery gets inhibited, thus fetching all the rows, and then the scan is the most stable execution plan in all combinations including this latter one.
Try adding this in the main WHERE clause:

... AND NOT (@Name IS NULL AND @Colour IS NULL AND /* rest of the parameters */ )

This is saying explicitly to the compiler that "among those parameters, at least one is mandatory".
I observed the compiler really understands it and removes the scan if this is the case here.

If it changes your execution plan the way you expect, I hope I will have showed how scalar expressions can really drive execution plans, and consequently what you can do with such an ability. At least you would have seen by yourself the disastrous result of one very specific use case immediately, while using dynamic queries you would have to hit that case to see it; and that is what I am trying to explain. ;)
Post #926734
« Prev Topic | Next Topic »

Add to briefcase «««7891011»»»

Permissions Expand / Collapse