Very frustrating performance tuning!

  • So I have run into this most aggravating of issues 😀

    Here is what I have; I am loading sales order data from staging data. I have this in production, and I have it in test. Our production server is well over 2x the resources (16 cores, 50gb memory, several raid 5's etc) vs test (8 cores, 24gb memory, 1 raid 5)

    I have been having some performance issues with said load script in production, so I restored to test and began tuning. One thing I noticed right away is that test was running the same exact data and script 2x faster! So diving into the first script which basically removes records from the destination that no longer exist in the source, I noticed in the execution plan that it is using a merge join instead of a hash join to join the 2 large tables (~13 million records). On the test system it is using the hash join. So just to test, I used the hash join option on the production server and it runs the same speed as the test server.

    I rebuilt all the indexes, created new statistics, messed with maxdop and recompile options to no avail.

    So my question is...what the heck am I missing? I have not come across such a conundrum in all my years as a DBA!

    Link to my blog http://notyelf.com/

  • Merge is, generally, better than hash.

    That said...

    You mentioned you created new statistics, but did you update with fullscan? On BOTH systems? I'm wondering if the Test system is actually the one that's out of date.

    After that, it may be due to the data volume of difference on the two systems. Not really sure without being able to see the two .sqlplans.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Default ANSI connection settings maybe?

    Take a look at the SELECT operator in both plans. Compare every value. What's different?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • That is odd that the merge should be faster than the hash, since if I add hash to the join (i.e LEFT HASH JOIN) it executes in half the speed...

    Both systems have the exact same data (well the test system is about a week old, but only 1% different?) I am restoring to test now to test the exact same data, but nothing has change in the last week to warrant any performance issue changes.

    Grant, I checked the server default options and they are both the same for each server.

    Link to my blog http://notyelf.com/

  • Also, to make sure I was using the exact same script, I actually took the script from production and pasted it into a query in test and ran it, and achieved the same results!

    Link to my blog http://notyelf.com/

  • shannonjk (3/13/2013)


    That is odd that the merge should be faster than the hash, since if I add hash to the join (i.e LEFT HASH JOIN) it executes in half the speed...

    Both systems have the exact same data (well the test system is about a week old, but only 1% different?) I am restoring to test now to test the exact same data, but nothing has change in the last week to warrant any performance issue changes.

    Grant, I checked the server default options and they are both the same for each server.

    I don't know, merge is the one of the most efficient join operations if the data supports it. Different data means different statistics. That can lead to different plans. What about the SELECT properties on each. Are there differences?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • shannonjk (3/13/2013)


    That is odd that the merge should be faster than the hash, since if I add hash to the join (i.e LEFT HASH JOIN) it executes in half the speed...

    This is one of those it depends. The data needs to support the physical join operator. There are times hash is faster and times that merge is faster. It boils down to the data and query.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • I finally figured it out!

    So to the previous questions first and foremost...yes the Select statements were exact.

    I did run the new stats with FULLSCAN on both systems. I even rebuilt the indexes on all tables on both systems just in case 🙂

    So in the end the problem WAS due to a change I had made in test I had forgotten about over a week ago. I won't get into the gritty details but suffice to say, the sales order table has a clustered index on it that was not unique. I had change it to unique as I thought that was rather odd that I had put a clustered index on there without the unique identifier (which I understood improves performance as well). So the end result was...the queries run faster on the sales order table WITHOUT the clustered index being unique. The test did not have it as unique, and production did. Which is why test was running faster. I did some testing and moved the changes to production and all was well again. Now I have to figure out WHY that is because that goes against what common sense would dictate lol.

    SQLRNNR (3/13/2013)


    shannonjk (3/13/2013)


    That is odd that the merge should be faster than the hash, since if I add hash to the join (i.e LEFT HASH JOIN) it executes in half the speed...

    This is one of those it depends. The data needs to support the physical join operator. There are times hash is faster and times that merge is faster. It boils down to the data and query.

    In regards to this SQLRNNR, I have seen a lot of 'it depends' and 'the data needs to support the physical join operator' in quite a few places without an actual explanation. Is there anywhere I can go to get a more in depth analysis on this? For instance what are common cases for a merge and a hash working better? Also, when you say data needs to support it, does that mean data types? Data volume? Structure?

    Sorry to pester with so many questions, but that is one aspect I do feel that I do not understand very well. I just know most of my queries run faster with the hash operator unless they are against much smaller tables.

    Link to my blog http://notyelf.com/

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply