Update: Scroll to the bottom for more information added after originally posting.
Earlier this week a coworker asked me for help figuring out why an execution plan was different for a query run against two identically configured servers. I've since spent the better part of the week trying to figure out why, and after several suggestions from folks on Twitter and two MVPs I still don't have an answer.
First, A Little Background
My company has 9 SQL Servers that are used by web servers to render pages for a web application. Each SQL server is more or less identical to one another – Dell 2950, 4 GB RAM, Windows 2003 R2 x86, and SQL 2005 Standard Edition. The /3GB switch is set to let SQL use 3GB of RAM, and these are dedicated SQL servers with no other processes running on them. Every server is a subscriber to transactional replication so they've got identical data and schema. There's no SSIS, SSAS, or SS-anything else-S at play on these boxes. The only difference is that a few servers have dual-dual CPUs and a few have dual-quad CPUs, but as you'll read in a minute that doesn't seem to matter.
Now here's the problem…At the beginning of the week, for this simple query 1 of the 9 servers resulted in execution plan "A" and the other 8 used plan "B". Things have shifted; now 7 of the 9 are using plan "A", but 2 servers are still using plan "B".
The first thing that came to mind was out of date or updated statistics. So I isolated one server using plan "A" and one using plan "B" and ran an UPDATE STATISTICS with FULLSCAN on all the indexes for the two tables included in the query. I verified the statistics from both servers matched by reviewing the results of DBCC SHOWSTATISTICS. No change in execution plans. (For what it's worth, AUTO_UPDATE_STATISTICS is enabled and we have an agent job that runs sp_updatestats throughout the day to keep statistics up to date)
I looked at SET options for connections. Identical.
I looked at collations. Identical.
I looked at the data itself. Identical.
I looked at server configuration options (sp_configure). Identical.
I looked at database compatibility levels. Identical.
I looked at memory available to SQL Server. Identical as far as I can tell.
I looked at SQL server versions. A-ha!
- Servers with plan "A":
- 9.00.4226.00 (6 of these)
- Servers with plan "B":
So maybe there's something there with the version difference. Except that the server that started the week on plan "A" is the one running 9.00.4230.00. The 6 servers running 9.00.4226.00? They were using plan "B" and have since switched to plan "A". And what about the server running 9.00.4262.00? That's a later version than the one running 9.00.4230.00, yet it's using plan "B". That makes me doubt that it's something related to the version number.
Just for fun and to add to the confusion, I changed the table variable in the query to a temporary table…and the plans matched!
Think You Can Figure It Out?
I'm stumped, and I'd love to hear what you think the reason for the difference in plans might be. I've linked to a the schema for the tables, the output of DBCC SHOWSTATISTICS for the index that I think is the root cause of the problem, the query itself, and execution plans "A" and "B" below. Object names and servers have been changed to protect the innocent. If there's something I'm missing that you want to know more about, contact me on Twitter (@SQLDBA) or leave a comment.
I'd love to figure out what's going on!
UPDATE: OK, as some people were quick to point out, the XML execution plans I included showed two different builds of SQL. So I went poking around and found that one of the 9.00.4266.00 builds has gone back to using plan "B" again. That means I have two different servers on the same build with different execution plans. To be sure it wasn't obviously statistics related I ran another UPDATE STATISTICS with FULLSCAN (and verified that statistics matched across servers), cleared the buffer cache, and the plans were still different. I also realized that in the course of changing object names I ended up with a duplicate column. I have fixed the schema script and updated the download with the plan "A" and plan "B" from the 9.00.4266.00 builds.