I’ve grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It’s a fairly human look at what’s involved in sailing on a Trident missile submarine…
This requires an immediate caveat. You should absolutely be using sp_executesql over any type of non-parameterized execution of T-SQL. You must parameterize your T-SQL because the lack of parameters in building up and executing strings is a classic SQL Injection attack vector. Using straight ad hoc T-SQL is an extremely poor coding choice because of SQL Injection, not because there is something that makes one method faster than the other.
Yet, I see in performance checklists that you should be using sp_executesql over straight ad hoc T-SQL because it will perform faster. That statement is incorrect.
Let me reiterate the caveat before we continue. I 100% advocate for the use of sp_executesql. This function is preferred over ad hoc SQL because, used properly (and isn’t that usually one of the main problems, always), you can both build an ad hoc query and use parameters in order to avoid SQL Injection. The security implications of SQL Injection are kind of hard to over-emphasize. SQL Injection has been a primary vector for hacking for close on to twenty years now. We know the best way to avoid it is to use parameterized queries with data validation around the parameters. Why this is continually ignored is hard for me to understand.
However, despite the importance of using sp_executesql, I’m not advocating for it’s use as a performance improvement mechanism. I’m unclear as to how this comes to be on a performance checklist, with no discussion of taking advantage of Parameter Sniffing and/or plan reuse (possible performance advantages). I can only assume this is yet another example of Cargo Cult Programming. People know that they are supposed to use sp_executesql (and yes, you are supposed to use it), but don’t really understand why, so they start guessing.
The tests are going to run primarily from T-SQL in order to compare a straight EXECUTE of a query string to sp_executesql. However, for the sake of protecting against SQL Injection, let me also mention that calling to your database strictly through code, you can use two approaches (well, several, but we’ll focus on two in order to keep this blog post to a minimal size, I can’t caveat and explain every single possible permutation of all possible database access methods while still making anything approaching a coherent point), building up ad hoc T-SQL and executing that against the server directly, or, using a mechanism to parameterize your queries. You absolutely should be using the parameterized methods in order to validate your input and avoid SQL Injection.
The Simplest Test
Let’s start with a very simple, and simplified, query in order to illustrate the point:
DECLARE @adhocquery NVARCHAR(max) SET @adhocquery = N'SELECT si.StockItemName, i.InvoiceDate, il.Description FROM Sales.Invoices AS i JOIN Sales.InvoiceLines AS il ON il.InvoiceID = i.InvoiceID JOIN Warehouse.StockItems AS si ON si.StockItemID = il.StockItemID;' EXEC (@adhocquery); DECLARE @sqlquery NVARCHAR(max) SET @sqlquery = N'SELECT si.StockItemName, i.InvoiceDate, il.Description FROM Sales.Invoices AS i JOIN Sales.InvoiceLines AS il ON il.InvoiceID = i.InvoiceID JOIN Warehouse.StockItems AS si ON si.StockItemID = il.StockItemID;' EXEC sys.sp_executesql @stmt = @sqlquery;
That’s the same query executed using the two methods in question. The results are an identical execution plan and exactly the same number of reads. If I execute either of them thousands of times then the execution times don’t vary. They have matching query hash and plan hash values. These are identical queries in every possible way. Even if I compare the performance across thousands of executions and include the compile time there is no difference in the outcome.
At the simplest possible level, these are identical mechanisms for executing a basic query. The only performance difference comes about because of parameters.
Test With Parameters
Instead of just running the query over and over again, I really want to test actual, meaningful, behavior this time. I’m going to load all the values for the BillToCustomerID column of the Invoices table using a PowerShell script. Then, I’ll execute the queries once for each of these values, using the two different execution methods.
To make aggregating the results easier, I put each query into a procedure:
CREATE PROCEDURE dbo.AdHoc (@BillToCustomerID INT) AS DECLARE @Query NVARCHAR(MAX); SET @Query = N'SELECT si.StockItemName, i.InvoiceDate, il.Description FROM Sales.Invoices AS i JOIN Sales.InvoiceLines AS il ON il.InvoiceID = i.InvoiceID JOIN Warehouse.StockItems AS si ON si.StockItemID = il.StockItemID WHERE i.BillToCustomerID = ' + CAST(@BillToCustomerID AS NVARCHAR(10)) + ';'; EXEC (@Query); GO CREATE PROCEDURE dbo.ExecSQL (@BillToCustomerID INT) AS DECLARE @sqlquery NVARCHAR(MAX); SET @sqlquery = N'SELECT si.StockItemName, i.InvoiceDate, i.SalespersonPersonID FROM Sales.Invoices AS i JOIN Sales.InvoiceLines AS il ON il.InvoiceID = i.InvoiceID JOIN Warehouse.StockItems AS si ON si.StockItemID = il.StockItemID WHERE i.BillToCustomerID = @BillToCustomerID;'; DECLARE @parms NVARCHAR(MAX); SET @parms = '@BillToCustomerID int'; EXEC sys.sp_executesql @stmt = @sqlquery, @params = @parms, @BillToCustomerID = @BillToCustomerID; GO
The results are fun.
|Execution Type||Average Duration|
|Ad Hoc||AVG: 14788.8924302789|
What’s going on? Is the conclusion that, in fact, ad hoc queries are faster than sp_executesql?
I intentionally picked a data set with a pretty interesting distribution. Depending on the value passed for BillToCustomerID there is the possibility of one of three different execution plans:
In fact, the data is such that the first value that would be called is going to generate the worst possible plan for all the other data sets because it leads to the plan that simply consists of three scans. Even if I choose to force one of the other plans first, something I did several times while testing, the fact that the ad hoc queries will always generate the best plan for the data set results in better overall performance for ad hoc, in this instance.
Please don’t mistake me. I could skew the data in another direction in order to make sp_executesql into the better performing mechanism. The initial premise was that you should use sp_executesql over ad hoc because it will be faster. That’s not the case. In fact, it completely depends on a number of factors as to which of these methods will be faster. That said, my preferred mechanism is to use sp_executesql because it creates parameterized queries where I can ensure, with a certainty, that I’m avoiding SQL Injection. To achieve parity on execution times, I could simply include the WITH RECOMPILE hint and then I would have the same speed as the ad hoc approach while still ensuring my security.
Oh, and to add another wrinkle, you could always turn on ‘Optimize For Ad Hoc’. That shaves a few more milliseconds off the ad hoc approach over the sp_executesql approach in this example.
I know I’ve over-emphasized this throughout this discussion, but I’m going to repeat it again, SQL Injection is dangerous and a purely ad hoc approach to queries leads to unsafe servers. You must validate your inputs and use a querying mechanism that ensures that SQL Injection won’t be an issue. This is accomplished by using parameterized queries, which includes sp_executesql.
However, should you be using sp_executesql over ad hoc queries because performance will improve? No. Clearly that’s not the case. Instead you should be using sp_executesql because it’s a safer, saner approach to writing your queries.
I love talking performance tuning. In fact, I’ll be doing an all day seminar on execution plans and query tuning before SQLSaturday Providence in Rhode Island, December 2016, therefore, if you’re interested, sign up here.