Multiple Execution Paths

It’s not uncommon to find stored procedures that have multiple IF statements controlling the flow of execution within the procedure. Now this seems to be a fairly logical thing to do, but there can be a subtle performance problem with this, one that may be hard to identify.

Let’s have a look at a simple example (using AdventureWorks)

CREATE PROCEDURE MultipleExecPaths (

@TransactionType char(1) = NULL

)

IF @TransactionType IS NULL

SELECT max(transactionDate) from Production.TransactionHistory

ELSE

SELECT max(transactionDate) from Production.TransactionHistory where TransactionType = @TransactionType

Nice and simple. If the parameter is passed, get the latest date for that transaction type, if the parameter is not passed, ie is null, get the latest date over all transaction types. So what’s wrong with this?

The problem goes back to parameter sniffing. When the procedure is first executed the first time all queries in the procedure are parsed, bound and optimised. When the optimiser processes each statement to generate an execution plan it uses the values passed for the various parameters to estimate the number of rows affected. The number of rows that the optimiser thinks the queries will process affects the choice of operators for the plan. Operators that are optimal for small numbers of rows are not always optimal for large numbers of rows, and sometimes the difference can be astounding.

Let’s see how the example above plays out to understand what’s happening here.

First thing I want to do is to run the two queries in that procedure separately to see how they behave in isolation. After that I’ll run the procedure and see how it behaves, both when the first call passes the parameter and when the first call doesn’t pass a parameter.

SELECT max(transactionDate) from Production.TransactionHistory

SELECT max(transactionDate) from Production.TransactionHistory where TransactionType = ‘W’

I’m hardcoding the parameter value so that we get much the same affect as we would with a parameter in a stored proc. If I used a variable, SQL wouldn’t be able to see the value of the variable at compile time and hence we wouldn’t necessarily get the same plan as we would in the procedure.

The first query, the one without a where clause executes with a clustered index scan. This isn’t surprising, there’s no index on the TransactionDate column. Execution statistics are

Table 'TransactionHistory'. Scan count 1, logical reads 3165,
physical reads 0.
SQL Server Execution Times:
  CPU time = 125 ms,  elapsed time = 131 ms.

The second query, with the filter on TransactionType also executes with a clustered index scan, even though there’s an index on TransactionType. Execution statistics are

Table 'TransactionHistory'. Scan count 1, logical reads 3165,
physical reads 0.
SQL Server Execution Times:
  CPU time = 110 ms,  elapsed time = 111 ms.

Note that I have more data in my copy of AdventureWorks than is normal, so the execution times and IO statistics are higher than they would be with a normal copy of AW.

Why the clustered index scan when there’s an index? The index on TransactionType is not covering, it doesn’t have the TransactionDate column in it, and the filter on TransactionType returns 125000 rows out of the total of 450000 in the table. That’s 27% of the table, far too high for the optimiser to consider a seek on a non-covering index and a whole load of key lookups.

So when run separately the two queries both execute in around 100ms and do just over 3000 reads. Now let’s see how they behave together in the stored procedure.

First test is going to be calling the procedure first time with a parameter

EXEC MultipleExecPaths @TransactionType = ‘W’

EXEC MultipleExecPaths

-- First Procedure call
Table 'TransactionHistory'. Scan count 1, logical reads 3165,
physical reads 0.
 SQL Server Execution Times:
 CPU time = 109 ms,  elapsed time = 117 ms.
-- Second procedure call
Table 'TransactionHistory'. Scan count 1, logical reads 3165,
physical reads 0.
 SQL Server Execution Times:
 CPU time = 109 ms,  elapsed time = 111 ms.

This looks fine. The execution statistics and execution plan are the same as in the case where I ran the two statements. Next test, I’m going to clear the procedure cache out and run the two procedures again, this time in the other order.

DBCC FREEPROCCACHE

EXEC MultipleExecPaths

EXEC MultipleExecPaths @TransactionType = ‘W’

-- First Procedure call
Table 'TransactionHistory'. Scan count 1, logical reads 3165,
physical reads 0.
 SQL Server Execution Times:
 CPU time = 109 ms,  elapsed time = 111 ms.
-- Second procedure call
Table 'TransactionHistory'. Scan count 1, logical reads 372377,
physical reads 0.
 SQL Server Execution Times:
 CPU time = 265 ms,  elapsed time = 266 ms.

Oops. What happened here? The reads have gone through the roof and the query duration has more than doubled. Let’s have a look at the execution plan, see if there’s a hint there.

Remember I said earlier that since the filter on transaction type affected a large percentage of the table, the optimiser wouldn’t chose an index seek with a key lookup? Well, that’s exactly what it’s done here. Question is, why. There’s a partial answer in the properties of the index seek

Estimated rows 1. Actual rows 124008. Yeah, that’s going to mess up the choice of plan. A quick dig into the xml plan gives a very large clue as to what’s going on here.

</ParameterList>

ParameterCompiledValue=”NULL”, ParameterRuntimeValue=”‘W’”. Since no rows will ever satisfy the condition TransactionType = NULL (assuming default ansi_null settings), the optimiser compiled the plan for 1 row.

What’s happened here is that when the procedure was run the first time and the optimiser generated the execution plan, it optimised all of the queries in the procedure based on the values of the parameters passed for that execution, regardless of whether the query could be executed with that particular set of parameters. So in this case, when the procedure was first executed and the execution plan was generated, it was generated for both queries based on a parameter value of NULL, even though the second branch of the IF could not be reached with that parameter value.

The important thing to take away from this is that when a procedure is compiled, the entire thing is compiled and all queries in it are optimised based on the parameter values for that call. If some queries will only be executed for certain parameter values then it can be that those queries will get very sub-optimal plans.

Great. We know the cause of the problem. What’s the fix?

Well, there are several options. The usual fixes for parameter sniffing can work here, the use of the OPTION (RECOMPILE) or OPTION (OPTIMISE FOR…) hints are certainly useful. If a query has OPTION(RECOMPILE) then its plan is never cached and so will always be compiled with an appropriate parameter value. OPTIMISE FOR can be used to override the optimiser’s parameter sniffing, regardless of what the actual parameter value is, the optimiser will use the hinted value instead.

There’s also another solution for this problem, one I personally favour. Sub-procedures. Because a stored procedure is only compiled when it’s executed, moving the contents of the branches of the IF statement into separate procedures and calling those procedures from the branches of the IF completely eliminate the chance of an inappropriate plan caused by this problem, and also prevents the optimiser from doing unnecessary work (optimising the entire procedure when only part will be run). So using this solution, the original procedure could be modified to this.

CREATE PROCEDURE MaxDateNoFilter AS

SELECT max(transactionDate) from Production.TransactionHistory

CREATE PROCEDURE MaxDateWithFilter (

@TransactionType char(1) — doesn’t need null default, because will not be called with null

)

SELECT max(transactionDate) from Production.TransactionHistory where TransactionType = @TransactionType

CREATE PROCEDURE MultipleExecPaths (

@TransactionType char(1) = NULL

)

IF @TransactionType IS NULL

EXEC MaxDateNoFilter

ELSE

EXEC MaxDateWithFilter @TransactionType

Now each sub-procedure gets it’s own separate cached plan and no matter how the outer procedure is called the first time, the plans will be optimal for the parameters that the sub-procedures are actually executed with.

Rate

Share

Share

Rate

Multiple Execution Paths

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts