Hello Dear Reader! It's been a while. I was working with a friend and we came across an interesting problem. They had a large amount of skewness/data skew. This led to some performance issues for them. The way this manifested itself was in a set of queries that ran quickly, normally within seconds. Then occasionally they ran much longer. To be precise, they ran about x800 times longer.

As you can imagine this is a less than ideal situation for a production environment. Their solution was to add OPTION (RECOMPILE) to all of their stored procedures. This solved the issue with their data skew. It caused additional side effects. Increased CPU as every stored procedure now had to recompile before execution. No stored procedure could experience plan reuse. Using DMV's to track stored procedure utilization and statistics no longer worked.

*"So Balls"*, you say, *"What is the alternative? Is there an alternative? And what in the name of King and Country is skewness/data skew!"*

Ahh great question Dear Reader! Dear Reader why are you English today?

*"Because you watched the Imitation Game last night, and Benedict Cumberbatch's voice is still stuck in your head."*Right as always Dear Reader! Good point let's explain it and then do a Demo!

__SKEWNESS/DATA SKEW__Skewness is a term from statistics and probability theory that refers to the asymmetry on the probability distribution of a real valued random variable about its mean. This could get complicated quickly. In simpler terms that I can understand it means that there are patterns based on variables with an assigned real value. Based on those variables skewness can be determined and it is the

difference of the normal.

How does this effect our query plans. With data skew we have a over abundance of data that fits one statistical model and it does not fit for others. This means the way the SQL Server Cardinality Estimator estimates for one may be different for another based on statistics.

Here's a quick example. I have a school with 100,000 students. Every student has a combination of 10 different last names. On average one could assume that every 10,000 students will have different last names. If we randomly assign these values, there will be a slight skewness. Most of the ranges will be similar. For this example I'll use my students table from my college database.

Now we move a new student to the area. This one student will give us quite a bit of data skew, and will be extremely asymmetrical to the other results.

In order to show this in action we'll make a stored procedure that returns our First Name, Last Name, and the Course Name of students by last name. Remember some students will have multiple courses. This means we will have more results than we do last names.

**8 minutes and 42 seconds**instead of a sub-second query.

__WRAP IT UP__