Folks, it’s time for a bit of a performance tuning exercise. Below is a real world situation where a stored procedure was taking greater than 22 minutes to complete. The stored procedure was retrieving a top 1000 and doing some joins with other tables. CPU usage was abnormally high on the host and we found that the CPU usage was related to this stored procedure. Below is what we found and what we fixed. Enjoy!
We have our stored procedure and are viewing the execution plan (see below) right off the bat we see some expensive joins and some clustered index scans. That is where we are going to start our troubleshooting and tuning.
The inner join was working with over 443 thousand rows gathered from an index on the table. A lazy table spool took the 4,943 rows from the index and ran through over 99 iterations of the data and stored 443 thousand rows in tempdb. The join was working with the data in tempdb (443 thousand rows) and joining on data from the 2 other tables. Table spools are good and bad, bad as they use resource heavy, good as they allow for quick access to data that is reused. SQL uses lazy table spools as a way to optimize queries by avoiding reading same data multiple times. With lazy spools SQL works with just 1 chunk of data (in this case 443 thousand rows) instead of fetching data with each iteration. This is supposed to have a performance gain on the query, but comes with a cost with the resources.
So we have our problem defined and our resolution define, now its time to actually put our theories in place and resolve the issue.
The clustered index/table scans and I need to figure out how to reduce the resource impact of the joins. One way we can reduce the join impact is reduce the amount of data that is being joined. First we will look at the left outer join towards the end of the query. The cost is 37% and it is joining data from the primary key index and data that has previously been joined and worked with. We cannot reduce the impact much on this join because the data/columns the application is joining makes up the entire table.
Second I need to figure out how to reduce the amount of data in the inner join with 40.7% cost, by reducing the amount of data in the table spool. Hovering over the clustered index scan on the index prior to the table spool you can view the output list. The output list is a good indication of the columns that should be included in an index, once you get the columns in the output list create the index.
As we look further in the execution plan looking at each index scan and gathering the output list columns. I created indexes for each of the tables involved with index scans and the tables before the two table spools.
Below is what the execution plan looks like after the indexes were created:
As I mentioned above due to the columns and data the application is interacting with, we cannot resolve the remaining index scans, however the stored procedure code can be modified and tuned to resolve even further.
I was able to capture a historical view of CPU usage for the query involved. Step 1 was before the issue was addressed, Step 2 and Step 3 were after indexes were being added.
The query went from taking 22 mins or more to complete to 12 seconds!