Blog Post

The Curiously Large CASE Statement

,

 

A powerful little expression in SQL Server is the CASE statement. The CASE statement has the ability to “transform” data per business rules to help better understand the data in a given query. In the simplest form, the CASE statement is a statement that evaluates a list of conditions (in order) and returns the first matching result expressions.

Most uses of the CASE statement are benign enough that one wouldn’t even bat an eye at them. Every now and again, along comes a nifty little query that contains over 300 conditions in the CASE statement. Something that extreme certainly deserves a second (third or even fourth) look. In this article, I am going to explore one such CASE statement in an effort to squeeze more efficiency out of it.

300!

300+ conditions may be a bit hard to imagine. I know I had to double take the list more than once. Immediately upon seeing such an awful piece of code (not pulling punches there), I really wanted to dump the conditions/evaluations into a table and join to it. Eventually, I did put it into a table but not before catching some metrics. Let’s take a look at what this thing looked like from the execution plan properties perspective.

In the preceding image, note the query map in the scrollbar. I have the case statement circled in green. Yes – it is really that large. Next, I have “cached plan size”, “estimated number of rows”, and “estimated subtree cost” circled in red. Each of these shows something that is a little alarming to me. First, the plan size is 48MB! Next, the cost of the query is a little elevated at 51 and change. Last is what seems to be a high value for the estimated number of rows to be returned (only ~6k actually get returned).

On the plus side, the query only takes 381ms, or does it? The query actually takes 3 seconds to complete despite the timing in the properties window. Sometimes, we have to take some of these values in the properties window with a grain of salt.

What if I try to run this query on SQL Server 2017 instead of SQL Server 2014? Let’s take a look at what that might look like.

Firstly, we have a few properties that seem to indicate that we have moved in a slightly better direction with this particular query. The cached plan size has dropped all the way down to just 37MB. The estimated cost dropped to 36 and change and the elapsed time dropped to 353ms. Nothing terribly earth shattering there – but just changing to SQL Server 2017, in this case, we see a bit of an improvement.

That said, take a look at the compile memory and compile time. Memory is 30,544kb and the time is 1,324ms. Aha – we see why this query takes 3 seconds on SQL 2014 and roughly 2 seconds on SQL 2017. High compile times. That memory for the compile is too high for my liking as well so let’s go ahead and take a look at why that might be a little high. It’s time to look at the plan XML and the converted tree for the query (because it is fun).

This is where we get a little geeky with the CASE statement. In the execution plan XML (and the converted tree we will see in a bit), every evaluation criteria for the case statement must have some sort of instruction on how and what should be done. In the case of the XML, we have multiple scalar operators to denote the CASE statement and then each of the evaluations for the CASE statement.

Once we enter the CASE statement, the first thing we see is that there is an “OR” operation (circled in red). Then for each of the values to be compared, we will see an “Equality” operation (circled in green) followed by the value to be evaluated. For just the three values I have in the image, notice how that immediately bloats the execution plan XML for this query.

In the preceding image, we see the converted tree for this query. We see the same pattern evolve here. There is a scalar operator for each value to be evaluated. In this image, I just have the equality comparisons. However, there is also an OR operator that is defined further up in this converted tree. Could this possibly be the reason for the bloated compile time and compile memory?

Let’s evolve that query just a bit and use a table and a join for that particular case statement to see what happens.

Immediately, we can see that the cached plan size improves dramatically! The cached plan size is down to 1.1MB. The compile memory is down to ~4MB and the compile time is down to 174ms. These are all significant improvements but now we see an interesting problem with the memory grant. This query is requesting over 400MB for the memory grant. That should be sounding sirens everywhere. Also of note is the estimated number of rows has dropped to ~28k (a significant improvement in estimate). This query is completing in under 1 second at this point compared to the 3 seconds otherwise but that memory grant is worrying me (and hopefully you too).

For giggles, I went ahead and added a columnstore index to this table (for good reasons beyond the scope of this article). Let’s see what that did.

Notice how the memory grant has dropped to just over 28MB? This is an acceptable improvement for this query. Also note the query execution time improved by a decent amount. This query timing for this query is similar with both a warm cache or cold cache at this point.

Remember that note about the XML and converted tree bloat? With the case statement, the query plan was in excess of 37MB and the revised version is sitting at about 1.1MB. Looking at the quantity of text in each, I see 5819 lines of XML in the version with the case statement and only 3755 lines of XML in the table version of the query (and more room for optimization). The converted tree is more dramatic in that the case statement version of the converted tree requires 3157 lines of instruction and the version that uses a table instead only has 1353 lines of instruction.

What does any of that matter? It helps to illustrate that this 300+ condition filled CASE statement is causing bloat and wasting space along with execution resources.

Wrap

If you ever find yourself writing a query that requires an excessive amount of conditions to be evaluated in a CASE statement, you should reconsider the query design. Too many conditions can lead to plan bloat and excessive memory grants. Keep an eye out for stuff like that!

Thanks for reading! This has been a pseudo deep dive into a really ugly CASE statement. If you would like to read other articles that are less of a deep dive, You can read check out the Back to Basics articles here, or the Extended Events articles here.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating