Need help comparing two execution plans

  • Hi all,

    We're testing an upgrade from SQL 2005 to SQL 2008 R2 in our data warehouse environment. I've run into an interesting situation with a change in behavior to one query, and I'm looking for advice on how to interpret this.

    Here's the scenario: this is an insert query which populates a fact table from a staging table, replacing natural keys with surrogate keys from a set of dimension tables along the way. The query is somewhat complex due to the number of joins, but the structure is pretty simple. The source staging table has 268,628,160 rows and is about 40 GB in size. The joins to the dimension table are all outer joins, so after the insert completes, the destination fact table also has 268,628,160 rows. The fact table is much wider than the source staging table, and is about 130 GB in size.

    The tables, indexes, data, and code are identical between the SQL 2005 and SQL 2008 systems. However, we're seeing a massive increase in tempdb utilization on the new SQL 2008 server. On the old SQL 2005 server, this insert statement doesn't consume more than about 50 GB space. (It may actually be less than that -- 50 GB is simply the size of the autogrowing tempdb database on that server.) On the SQL 2008 server, this insert statement consumes about 300 GB of temp space. That's the size of our tempdb drive, so you can guess how we found this problem. 😉

    I've attached the two query plans. They're not identical (unsurprisingly), but they're pretty close in structure. The major changes seem to be simply the order in which SQL is choosing to apply the joins. However, there are two things that caught my eye.

    1) The SQL 2008 plan begins with the correct Estimated Number of Rows of 268 million, but partway through the plan, a hash match bumps up the estimate to 355 million, and that incorrect estimate remains through the final insert operator. The SQL 2005 plan has the correct estimate of 268 million from start to finish.

    2) Under SQL 2005, the Table Insert operation at the end of the plan is only 3% of the overall cost. In the SQL 2008 plan, that operation accounts for 85% of the cost.

    Nearly every table joined in this query is truncated and rebuilt as part of the ETL process, so I can confirm that the statistics are brand new on every table in both systems. DBCC SHOW_STATISTICS even shows very similar heuristics for the columns I've analyzed, which I'd expect, given that the data is the same.

    What might account for this massive difference in tempdb utilization? Could these execution plan differences be related, or are they red herrings? The hardware is completely different between the two environments, but the SQL 2008 server is superior in every way -- double the memory, double the processors, newer hardware, better SAN, etc.

    Any help or ideas will be much appreciated -- thanks in advance!

  • First question, after the upgrade, did you manually update all the statistics. Further, did you update them with a full scan? Statistcs are stored differently from one version to the next. You can wait for auto updates to fix them or you can update them yourself. Second, the ANSI settings are different between these two plans. That could lead to some of the differences you're experiencing. Possibly not, but it's there.

    Basically, we're looking at moving everything. Every operator to access the data on both plans is a scan. I'd be hard-pressed to suggest how to tune the query. What's causing the differences? I'd first suggest statistics. After that, the optimizer changes from one version to the next. Some queries that behaved well enough in the old version will no longer behave well because of those changes. You might be hitting that case. But to really fix it, maybe find a way to break down the load process into smaller steps so that the query can actually be optimized.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Mr. Fritchey! Thank you sincerely for your reply. I'm honored -- your execution plan e-book has been my go-to reference on this stuff for years. 🙂

    I updated all the statistics with fullscan this morning, but it doesn't seem to have altered the execution plan. On the new system, the estimated row count is still changing mid-plan. That's confusing the heck out of me. Could that incorrect estimated row count be caused by anything other than statistics?

    More importantly, does any of this have to do with the tempdb utilization? The plans are so similar in structure -- why would the tempdb utilization differ so dramatically? Should I be looking for some other cause? Could some system misconfiguration cause this kind of behavior?

    I'm also unclear about the ANSI settings. SELECT SESSIONPROPERTY ('ANSI_NULLS') shows me that the option is turned on in both environments, and yet the execution plans show different ANSI settings, as you noticed. Do you know what controls those settings within an execution plan, if they're not adopted from the current session environment?

    I'm going to work on your suggestion of breaking the insert down into smaller chunks, if only to determine where and when the behavior starts to diverge. Thanks again!

  • Thanks for the kind words.

    The differences in the tempdb usage are probably just down to the variations within the plan. As to exactly why you're getting what you're getting... I'm not sure. I'd have to step through the plan in detail, might take hours. Sorry, but ain't going to happen.

    The ANSI settings are a connection setting from wherever you're running the query. But that difference might (big word, might) be leading to some of what you're seeing. I suspect it's more likely to just be differences in the optimizer.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I've done some further digging on this, but I still haven't been able to determine why we're seeing this increased tempdb utilization. I pointed out in my original post that the cost percentage of the Table Insert operation itself is much higher under SQL2008 than 2005. I believe this is because the "Estimated I/O Cost" of that operation is many times higher -- 148,764 under SQL2008, and only 1,994 under SQL2005.

    So, the question of the day: how does SQL Server determine this "Estimated I/O Cost"? I would think it would be based on the estimated number of rows and estimated row size, but those aren't different enough between the environments to account for such a massive discrepancy. (In fact, the incorrect row estimate under SQL2008 is a red herring -- I generated the execution plan on another SQL2008 server and got the correct estimated row count, but still with this massive I/O cost estimate. I can upload that plan if anyone would like to see it.)

    I read a few articles stating that the Estimated I/O Cost is based on the number of pages. Is that correct? Could the number of pages involved be radically different between these two environments somehow? Is there any way to capture what SQL thinks is the number of pages here? Or alternatively, is SQL Server giving such a huge I/O cost estimate under SQL 2008 because it's factoring in its tempdb usage?

  • Refer back to the number of pages in the indexes. You can use sys.dm_db_index_physical_stats (I think that's right, typing from memory) to see the size of the index or table.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I compared the page counts for each index in the database between these two environments. The tables are identical in both, so the page counts are nearly identical as well. They differ by only a fraction of a percent (if at all) for each index. So does that tell us that SQL 2008 is using something other than page counts to come up with an estimated I/O cost? And, more to the point, is that high estimated I/O cost even related to the tempdb usage I'm fighting?

  • Estimated IO cost is only partly based on page count. But, the fact that you're getting discrepancies is the beginning of what might be the issue. Have you looked at the statistics? They are without a doubt different too. I'll bet the data distribution within the histogram portion of the statistics is different too. The variations may be small, but they can add up quickly.

    Now, there probably are differences between the versions of SQL Server how they measure things. Microsoft is always making changes, big & small, to the query optimizer. You're in a situation where the amount of data is different, the versions are different, and I'll bet the data distribution is different. So, as to why the plans are different, you've largely answered the question.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (2/27/2013)


    Estimated IO cost is only partly based on page count. But, the fact that you're getting discrepancies is the beginning of what might be the issue. Have you looked at the statistics? They are without a doubt different too. I'll bet the data distribution within the histogram portion of the statistics is different too. The variations may be small, but they can add up quickly.

    Now, there probably are differences between the versions of SQL Server how they measure things. Microsoft is always making changes, big & small, to the query optimizer. You're in a situation where the amount of data is different, the versions are different, and I'll bet the data distribution is different. So, as to why the plans are different, you've largely answered the question.

    I wish that were the case! The data is *identical* between these environments; there are no table or index differences whatsoever. I've updated statistics with fullscan on both servers, and the histograms are *identical* for every column I've examined. (That surprised me -- I expected to see at least some minor differences in the histograms, but apparently that code hasn't changed between versions.)

    So really, I'm in a situation where the data is identical and the data distribution is identical. If there are differences in the execution plans, they come down to SQL version or hardware differences. And really, the execution plans aren't all that different except for the order of the joins. The only difference that I find suspicious in the plans is that estimated i/o cost on the final insert. I'm completely at a loss to explain why it's so vastly higher under SQL 2008 (nor am I sure it's even related to my actual problem of tempdb usage).

    I may need to hit up Microsoft for help. The only "resolution" I can think of is to try rewriting all the affected queries, which seems ridiculous given how straightforward these queries are (especially my problem comes down to something fixed in a service pack).

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply