Blog Post

More On Overlapping Statistics

,

In my last post I provided a script to identify overlapping statistics but realized afterwards that I left out a few points that complete the big picture about why overlapping statistics matter.

What Causes Overlapping Statistics
Boiling this down to basics, here's how it happens: AUTO_CREATE_STATISTICS is set to ON, SQL Server creates some column statistics based on queries that get executed, and then an index is created which creates an overlapping set of statistics on the same column. One real world example of how this can happen is when applying a missing index suggested by SQL Server to help improve performance. Looking at missing indexes is a very common performance tuning practice and I suspect that many people have unknowingly created overlapping statistics but applying missing index suggestions.

Why Overlapping Statistics Can Cause Problems
Paul White (Blog | Twitter) pointed me at Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 and Plan Caching in SQL Server 2008, two Microsoft whitepapers which contain an (easily overlooked?) snippet about how the query optimizer decides which set of statistics to use when there are overlapping statistics on a column. The answer, in a nutshell, is "it's complicated". There are guidelines, e.g. preference is given to statistics computed with FULLSCAN, but neither resource goes into any level of depth about how it really works. I'm not a Microsoft insider and I don't have access to the specific rules so my way of thinking is why leave it to chance? If I know I've got only one set of statistics on a column and the query plan goes wonky because of something statistics related I have more control over dealing with it. Otherwise leaving overlapping statistics in place can lead to the odd behavior I've previously blogged about.

I hope the additional information helps.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating