Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

A Tale Of Two Execution Plans – Solved!

Earlier this week I wrote about a perplexing problem I was having where identical servers were producing different execution plans for the same ad-hoc query. I got a lot of great feedback, and Wes Brown (Blog | Twitter) even contacted Conor Cunningham (Blog), query optimizer team lead, for me. But it was this suggestion from Paul Randal (Blog | Twitter) and an IM from Jonathan Kehayias (Blog | Twitter) that helped me uncover what was going on:

image

Quick Recap
Recall that my problem started out like this: A simple ad-hoc query executed against 9 servers with (more or less) identical hardware, identical schema, identical data, and identical index statistics produced different execution plans. I looked at a variety of things to try and figure it out: SET options on the connection, collations, compatibility levels, server configuration options, build versions, and index statistics. Ultimately everyone who looked at it figured that the root cause of the problem was probably related to this: In Plan "A" the estimated number of rows for an index seek differed from Plan "B", and that in turn lead the query optimizer to choose an index seek + key lookup vs. an index scan on a different table later in the execution plan.

Plan "A" Plan "B"

Doesn't That Look Like A Problem With Statistics?
Yes, you'd think that…except that I updated statistics for the index in question (well, for every index on the table actually) with FULLSCAN specified. That should have taken care of the problem, right? Not exactly. One thing I didn't think about was that AUTO_CREATE_STATISTICS was set to ON for every database (read more about this option here). With this option on the query optimizer will create statistics on individual columns, as necessary, to improve estimates for query plans. You can see which columns have had statistics created for them by using sp_helpstats (BOL entry here). 

A Eureka Moment
A check against the 9 servers revealed that the 4 servers currently with plan "B" had statistics auto created for the column used in the query's join and the 5 servers with plan "A" did not. I picked one of the servers using plan "B", dropped the auto created statistics, and…the query optimizer was now picking plan "A"! I did the same against every server using plan "B", and every server switched to using plan "A". Ahh, I love repeatability! (Note that dropping auto created statistics isn't something you want to casually do – the query optimizer created them for a reason, after all)

So The Answer Is…
In the end, it had nothing to do with build versions, slight variations in CPU models, fragmentation, SET options, server options, or any of the multitude of things I looked at…it was the auto created column statistics that affected which plan the query optimizer choose. The takeaway? Add that as one of the things to check when a query isn't behaving like you think it should.

Thanks again to Paul Randal and Jonathan Kehayias for pointing me in the right direction. I can once again sleep at night now knowing that there's a reasonable explanation for what was happening.

Comments

Posted by Pei Zhu on 14 February 2010

It is interesting. Is there any alternative such as dropping/disabling that particular statistics auto-generated or even more aggressive to disable the statistics generation for columns with indexes while I could have AUTO_CREATE_STATISTICS ON?

It does not make sense for the columns with index, sql engine still generate stats on them, "as necessary" since sql engine has the stats after creating indexes(make things simple assuming indexes are single column indexes).

Thanks.

Posted by steve Schneider on 17 February 2010

Great point!  Thanks for the information.  This kind of blogging helps us all progress.

Posted by Scott Russell on 21 March 2010

Kendall,

Great post on an elusive problem.

As an alternative to deleting the problem non-index column statistics, could you get the same effect by manually updating that column’s statistics (possibly using full scan instead of a sampled subset scan) – either on demand or as part of a periodic DB maintenance job that updates statistics or rebuilds indexes (indirectly updating statistics)?

Put another way: You have identified a gap between some DB statistics that were auto-created with sampled (less than full scan) lower quality levels and those DB statistics that are well maintained with periodic scheduled, manually created, higher quality level statistics updates (but may not be updating non-index column statistics).

Ben Nevarez had a great article on this topic a while back, regarding the Alter Index … Rebuild and Update Statistics options (Rebuilding Indexes vs. Updating Statistics – link: sqlblog.com/.../rebuilding-indexes-vs-updating-statistics.aspx) – a real enlightenment for me.  His post didn’t cover the DBCC DBReindex option, which I later found out also rebuilds all indexes and all statistics (index and non-index columns) for a given table in a single command (much easier to use versus many separate commands per table).  Even though the DBCC DBReindex command is deprecated, I still use it for the simplicity, and wish there was a non-deprecated command that accomplished the same tasks as succinctly (without extra code – Alter Index All … Rebuild followed by Update Statistics … With FullScan, Columns).

I found it useful to query the DB statistics meta-data (date / time updated, etc.) to confirm which statistics got updated by a given command / option, using queries that Ben demonstrated in his post.

Outdated DB statistics, application sensitivity to outdated statistics, inclusion of all appropriate DB statistics in periodic manual updates, and determining reasonable manual statistics / index update schedules (based on application requirements) are very thorny issues.  Thanks for shining a new light on these issues for all to benefit.

Thanks,

Scott R.

Leave a Comment

Please register or log in to leave a comment.