Disclaimer: The title is my assumption because I saw it in the past happening this way.
This blog post aims to make you remember something: something that is obvious to you, might not be obvious to others.
A client has a process which consists of a stored procedure that wraps a bunch of other stored procedures.
The process runs for about 10 hours.
Taking a look…what is running right now?
I was curious about the process, I’ve seen this running before but never explored the code. After a couple of days of seeing it running for so long, I decided to see what would be the random query I would get executing.
I had some luck and saw one with a pattern that I knew clearly that could be rewritten it and make it faster.
sp_WhoIsActive to the rescue
If you don’t know what
sp_WhoIsActive (Adam Machanic‘s creation) stored procedure is, let me copy the short definition from the whoisactive.com website:
sp_whoisactive is a comprehensive activity monitoring stored procedure that works for all versions of SQL Server from 2005 through 2017.
You can download it from the download page or if you use dbatools you can use the command that will download it for you and install it. You can read more about it in my previous blog post New Version Of sp_WhoIsActive (V11.20) Is Available – Deployed On 123 Instances In Less Than 1 Minute
sp_WhoIsActive to get the current running query
The query that was being run has the following structure
SELECT column1 FROM table1 WHERE EXISTS ( SELECT column2 FROM table2 WHERE table2.column1 = table1.column2 ) AND ( EXISTS (SELECT column1 FROM table3 WHERE table3.column1 = table1.column1 AND table3.column2 = @param1 and table3.column3 = 0) OR EXISTS (SELECT column1 FROM table3 WHERE table3.column1 = table1.column1 AND table3.column2 = @param2 and table3.column3 = 0) OR EXISTS (SELECT column1 FROM table3 WHERE table3.column1 = table1.column1 AND table3.column2 = @param3 and table3.column3 = 0) OR EXISTS (SELECT column1 FROM table3 WHERE table3.column1 = table1.column1 AND table3.column2 = @param4 and table3.column3 = 0) OR EXISTS (SELECT column1 FROM table3 WHERE table3.column1 = table1.column1 AND table3.column2 = @param5 and table3.column3 = 0) OR EXISTS (SELECT column1 FROM table3 WHERE table3.column1 = table1.column1 AND table3.column2 IN (@param6, @param7) and table3.column3 = 0) )
However this default call does not bring the execution plan, for that you need to run using the
EXEC sp_WhoIsActive @get_plans = 1
I have copied the XML that contains the sqlplan to SentryOne Plan Explorer and this was what I saw
Within the red circle we can see a concatenation operator (first one on top left) which will get the result of each EXISTS sub query (8 in total).
If we want to know which outer batch or stored procedure call was issued by the application or user we can use the
EXEC sp_WhoIsActive @get_plans = 1, @get_outer_command = 1
However, if we have nested calls, this will not show the current batch or stored procedure where the code of the current query belongs.
To get that we need to use the
EXEC sp_WhoIsActive @get_plans = 1, @get_outer_command = 1, @get_full_inner_text = 1
This way, the
sql_text column will contain the whole batch or stored procedure where the query belongs.
Back to the query – The pattern
Can you see the pattern? A lot of
OR EXISTS() conditions. That is odd indeed, it wouldn’t be so odd if each
OR EXISTS() was accessing different tables…oh, wait…they are not and that is where the problem is.
Easy to improve
We can easily re-write the query without changing the logic or affecting the output data.
I have re-written the code in the following way:
SELECT column1 FROM table1 WHERE EXISTS ( SELECT column2 FROM table2 WHERE table2.column1 = table1.column2 ) AND EXISTS ( SELECT column1 FROM table3 WHERE table3.column1 = table1.column1 AND ( table3.column2 = @param1 OR table3.column2 = @param2 OR table3.column2 = @param3 OR table3.column2 = @param4 OR table3.column2 = @param5 OR table3.column2 IN (@param6, @param7) ) AND table3.column3 = 0 ) )
This way we will just hit the
table3 once instead of one time per
The actual plan seems to have a much better shape:
A different approach would be a single
IN () condition with all variables comma separated. However, I preferred this way as it’s easy to show to the developement team the differences between now and before.
In fact, when we use the
IN operator the optimizer will expand it to various
OR conditions. Example:
With this change, I have improved the query by 99%.
Query went down from ~4340 seconds to less than 30 seconds.
NOTE: The table had ~46M records.
Also, the number of logical reads for the table dive-bombed!
Before (optimizer used a Worktable):
Table ‘Worktable’. Scan count 924230, logical reads 1045662012
Table ‘table3’. Scan count 6, logical reads 4967238
SQL Server Execution Times:
CPU time = 3824887 ms, elapsed time = 4344872 ms.
Table ‘table3’. Scan count 9, logical reads 829734,
SQL Server Execution Times:
CPU time = 86345 ms, elapsed time = 26074 ms.
This means that on the whole process we have saved 1h!
How’d this happen?
As said in my title and initial disclaimer, this smells like a copy & paste pattern. Maybe something similar to:
Client: “Hey can we have another validation for a different value?”
Dev: “Sure, it’s pretty easy to do so”
Also Dev: copy & paste existing OR EXISTS () change parameter, commit to source control and push it into QA test (with few data) and it’s good to go into PROD.
Client: “Thanks it’s working just a little bit slower but today things are slower in general”
When you find these kind of patterns, invest a couple of minutes to test it with better logic.
You may end saving a “couple” of CPU cycles and saving a lot of time.
Thanks for reading!