This is the fifth post in my Parameterization series. In previous posts I wrote about parameter sniffing and when it is good or bad, and also about how to work around the problems associated with parameter sniffing when data is not uniformly distributed. In this post I would like to focus on two common mistakes that developers do, which make parameter sniffing a really bad thing, even when it would be a great thing otherwise.
The reason I’m focusing on these two mistakes is because I see them so often, and they usually have such a dramatic impact on performance. There is a good chance that if you check the code in your system, you’ll find a few occurrences of these mistakes, and you might suffer from performance problems due to these mistakes without even knowing about it.
OK, I hope I made you curious enough, so let’s get to business. In order to demonstrate the two mistakes, I’m going to use the following use case… An online retail company has a back office application, in which there is a report that allows the marketing department to see all the customers who placed orders since a specific date. For example, they might want to generate a list of all the customers who placed orders in the last month and send them a coupon or something.
The report includes a date picker control, where the user can pick a specific date and then generate a report of all the customers who placed orders since the chosen date. When the user submits the report, the back office application calls a stored procedure in the database that look like this:
CREATE PROCEDURE Marketing.usp_CustomersByLastPurchaseDate ( @Date AS DATE ) AS SELECT Id , Name , Country , LastPurchaseDate FROM Marketing.Customers WHERE LastPurchaseDate >= @Date; GO
This worked fine for a while, but then one day the main users of this report said that they almost always interested in customers who placed orders in the last week, and only once in a while they would like to generate the report for different periods of time. So having to pick the specific date from last week every time is annoying and also error-prone. The user asked that the report will present customers who placed orders in the last week by default, and only if a user wants to try another date, then she will choose the desired date using the date picker.
The First Mistake
That sounds like a reasonable requirement. There are several ways to support this requirement. One of the options is that in the default case (in which the user hasn’t picked any date), the application will not pass a value to the @Date parameter of the stored procedure. In this case, the parameter will have a default value of the date last week. Something like that:
ALTER PROCEDURE Marketing.usp_CustomersByLastPurchaseDate ( @Date AS DATE = DATEADD (WEEK , -1 , SYSDATETIME ()) ) AS SELECT Id , Name , Country , LastPurchaseDate FROM Marketing.Customers WHERE LastPurchaseDate >= @Date; GO
But this code doesn’t work, because you can’t use an expression for a default value of a parameter. You can only use a constant value or NULL. So here is how many developers implement this requirement:
ALTER PROCEDURE Marketing.usp_CustomersByLastPurchaseDate ( @Date AS DATE = NULL ) AS IF @Date IS NULL BEGIN SET @Date = DATEADD (WEEK , -1 , SYSDATETIME ()) END; SELECT Id , Name , Country , LastPurchaseDate FROM Marketing.Customers WHERE LastPurchaseDate >= @Date; GO
This code works just fine and returns the desired results. If a user asks for the default report, the default value of NULL is used, and then the parameter changes its value to last week’s date. If a user asks for a specific date, then the parameter receives the specific value overriding the default value, and the IF statement is bypassed. Everybody’s happy!
Except for the users… It’s true that now they don’t have to pick the date each time, but now they say that since the change was made, the report is much slower than before, and they have to wait for a long time before they get the results. One of the users even said that he prefers to pick the date each time and get fast results rather than not having to pick the date but get such poor performance.
So what’s going on?
We can assume that after the change was implemented, the first time a user used the report was with the default behavior. It means that the stored procedure was optimized with the value of NULL for the @Date parameter. Remember that parameter sniffing works at compile-time, and the value of each parameter as it is passed by the caller is used for optimization. Any changes to the values of parameters inside the stored procedure happen at run-time, and the optimizer is not aware of them when optimizing the stored procedure.
So the optimizer needs to estimate the number of rows for the following predicate:
LastPurchaseDate >= NULL
Assuming you’re using the default ANSI_NULLS setting, which is ON, the optimizer doesn’t even need to look up the statistics for the “LastPurchaseDate” column. The predicate above returns False by definition, so the optimizer estimates 0 rows. Based on this estimation, it chooses to use an index seek on the “LastPurchaseDate” column with a key lookup to fetch the relevant columns.
But the actual execution is a completely different story. Because the parameter value changes at run-time, the actual value that is used in the predicate is last week’s date, and the query might return many rows. Let’s say – 20,000. So now SQL Server has to perform 20,000 key lookups, and most probably a scan would have been a better choice.
From now on, no matter what date the users use with this report, this stored procedure has a plan that estimates zero rows, and this plan is going to be reused every time.
This is the first common mistake that developers make, and in most cases they are not aware of the implications on their application’s performance. The message here is that you should avoid changing the values of parameters inside the stored procedure. If you change parameter values, then there is going to be a difference between the value used by the optimizer to generate the plan and the actual value used at run-time.
The Second Mistake
Some developers implement this change request in a different way, either because they are aware of the first mistake or simply because this is the way they think is best. These developers know they shouldn’t change parameter values inside the stored procedure, so instead they copy the values into local variables, and use the local variables instead. Something like this:
ALTER PROCEDURE Marketing.usp_CustomersByLastPurchaseDate ( @Date AS DATE = NULL ) AS DECLARE @CalculatedDate AS DATE = ISNULL (@Date , DATEADD (WEEK , -1 , SYSDATETIME ())); SELECT Id , Name , Country , LastPurchaseDate FROM Marketing.Customers WHERE LastPurchaseDate >= @CalculatedDate; GO
This is the second common mistake, but it’s a mistake for a completely different reason. In this case, parameter sniffing is effectively disabled, because the parameter is not used anywhere in the stored procedure. The query doesn’t reference the parameter at all, so when the optimizer needs to optimize the query, it doesn’t need to use parameter sniffing in order to consider the parameter value.
Instead, the optimizer now needs to estimate how many rows will be returned for a predicate that is based on a local variable. The problem with local variables is that they are declared and assigned at run-time, so at compile-time they are always treated as unknown values. The optimizer has specific rules for handling unknown values. In this case, since this is an inequality predicate, the optimizer estimates 30% of the total number of rows in the table. Remember that we are trying to retrieve rows from the last week only. If the table contains data for the last 5 years, then 30% is probably a very wrong estimation.
So this method is also wrong, and the message here is to avoid local variables in query predicates, because their values are unknown to the optimizer at compile-time.
The Right Way
So what is the right way to implement this change request? There are several options:
- The best thing to do is to pass the logic to the application. It means that the application will calculate last week’s date and pass this value to the stored procedure in the default case. The stored procedure’s parameter won’t even have a default value, and the correct value will be used during optimization thanks to parameter sniffing.
- If this is not possible for some reason, then you can use the OPTION (RECOMPILE) query hint in order to recompile the query at run-time. This will ensure that the optimizer uses the correct values at run-time. This method works well whether you’re changing the value of the parameter or using a local variable, but its drawback is that it performs a recompile in each execution.
- If you want to avoid the recompile in each execution, then you can split the stored procedure into two stored procedures. The first procedure will receive the parameter with the default value of NULL and calculate the last week’s date, either by changing the parameter value or by using a local variable. Then, instead of running the query, it will call the second stored procedure and pass the calculated value as a parameter. This procedure will receive the parameter with the correct value (no need for default value here), and use this value to run the query. This way, parameter sniffing will work well in the second stored procedure, which is the one that actually runs the query.
- Another option is to make two copies of the stored procedure. When a user asks for the default report, the application will call stored procedure A, which will always return rows for the last week. This stored procedure won’t even have a parameter, and it will have an optimized plan with predictable performance. When a user picks a specific date, the application will call stored procedure B, which will have a parameter without a default value. This stored procedure will perform a recompile in each execution in order to generate the best plan for every date chosen. Since this type of report is rarely used, the overhead of recompile is acceptable.
If you are already aware of these mistakes and you know for sure that your code is “safe”, then this is great. Go have a beer or something. But if you’re not, then go ahead and check the code in your stored procedures. There is a good chance you’ll find these mistakes somewhere in there, and there is a good chance that you already have performance issues because of that. Once you fix these mistakes, you can go and have a beer with the others…
image “doh” courtesy of hobvias sudoneighm
The post Parameterization Part 5: Two Common Mistakes appeared first on .