Strange Performance Issue

  • Need help with a really strange performance issue.

    We have a query that runs within SSIS package. I created a SQL Agent job to run the package and it completes in 30 minutes.

    Next, we have a full warehouse job and this package runs somewhere in the middle of the warehouse SQL Agent job. This time it takes over 2.5 hours just to execute the same package (same query, same server, same databases).

    Looking at the reads/writes, everything looks same except the execution time/duration of the query.

    Looking at the execution plan, in the first run, we have Hash Match join (which looks good) while in the second run, its all Nested Loop joins and the plan is completely different.

    I looked at the Connection Properties to confirm they look exactly the same.

    Also, before each run, I run DBCC FREEPROCCACHE to clear the cache so it can generate new plan and I'm comparing same scenario.

    Any thoughts/ideas on why this is causing so much pain is highly appreciated...

    Thanks!

  • If you force a new plan every time (which might or might not be a good idea, but using the DBCC FREEPROCCACHE sledgehammer is definitely not right), then the most probable cause for the switch from hash match to nested loops is that the cardinality estimates have changed. If the actual rowcounts correspond with the changed cardinality, then this is okay and the nested loops plan probably was better that second time. But if the cardinlaity estimation is wrong, the nested loops join is a bad plan.

    The most probably cause for this is what is known as the "Ascending Key Problem". For instance, your query selects orders places in the last week, but the statistics were last updated eight days ago. Based on the statistics, SQL Server expects no rows to match the WHERE clause and it uses the lowest possible cardinalty estimate of 1 row. But in reality your webshop had a bucketload of traffic and there were a million orders in the last week. Bam - terrible performance.

    Google for "ascending key problem" to read more about the backgrounds, and to find what can and cannot be done about it. Forcing a statistics update before your query will almost certainly help but is once more a sledgehammer; there are more subtle ways. Which one to use depends on the SQL Server version you are using, and you still need to understand it - that's why I tell you to go forth and Google, instead of just providing some code to run.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Thanks for the suggestion of "ascending key problem" which I'm going to dig into.

    But at this point, we don't mind using a sledgehammer approach. Also, there is no time gap between the 2 runs. I run them back to back. There is no change in the data/statistics on the tables/database from which the query is running the SELECT statement.

    Not sure if the warehouse runs pollute the buffer and put a constraint on the available resources which might be causing the issue. Just a guess at this point...

  • The ascending key problem: http://sqlinthewild.co.za/index.php/2011/03/22/statistics-row-estimations-and-the-ascending-date-column/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Another thought that comes to mind is:

    I understand there could be a problem with "ascending keys" in the tables that are loaded before being used in the SELECT query which takes 2.5 hours. Still to test out this theory by either rebuilding clustered key on previous tables in the warehouse or updating statistics on those tables.

    But when we run the query/package in isolation, we don't run any index rebuild or update stats on those tables and the query/package still completes in about 30 minutes.

    I hope I'm explaining the scenarios correctly...

    Unfortunately, due to month-end warehouse activities on the build server, I won't be able to test till first week of Feb.

    Thanks Gail and Hugo for the pointers.

  • Resolved! Yes, it was indeed a case of index not being built before using the table. Rebuilding the indexes helped.

    THANK YOU, Hugo & Gail for the suggestions!

  • You don't need to rebuild the indexes, doing so is a massive waste of time. Fragmentation's not the problem. The stats are out of date, that's the problem.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the article, Gail. I learned something new today, which is a good thing every day.

  • So, Hitesh, I guess the question is about statistics. Do you have a statistics maintenance routine schedule to run regularly? Yes, it can take a long time if your tables are large, but I think it'll be worth it if you can spend the time. Read Gail's article for examples that show how.

  • Yes we do have weekly maintenance jobs but in this particular case, its completely a different scenario. The server in question is a "dedicated warehouse" build server used for monthly warehouse builds and ad-hoc SSIS testing for warehouse.

    The tables that are used gets loaded as part of the DW and we have to rebuild the indexes once the tables are populated. Unfortunately, we were not rebuilding clustered indexes and that was throwing statistics way off.

  • hitesh jain (2/1/2016)


    Yes we do have weekly maintenance jobs but in this particular case, its completely a different scenario. The server in question is a "dedicated warehouse" build server used for monthly warehouse builds and ad-hoc SSIS testing for warehouse.

    The tables that are used gets loaded as part of the DW and we have to rebuild the indexes once the tables are populated. Unfortunately, we were not rebuilding clustered indexes and that was throwing statistics way off.

    If you add an UPDATE STATISTICS step at the end of the procedure, you'll be fine.

    On the other hand, if you ever chagne the script to do rebuild indexes, then remove the UPDATE STATISTICS. They are automatically updated with fullscan when rebuilding indexes; updating them again is at best a waste of resources, and at worst you end up with worse statistics (if the explicit rebuild does not use fullscan).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply