Execution Plan turned "Bad"

  • DB version: Microsoft SQL Server 2008 (SP1) - 10.0.2573.0 (X64)
    RCSI enabled

    Hi,

    last week one of our production job was hanging during several hours. I killed it at some point but I was able to monitor the situation in detail in the meantime.
    One specific table refresh query (TRUNCATE / INSERT INTO SELECT) which usually takes 20 seconds was in "runnable" state and was doing nothing during several hours.
    I thought it was due to some contention so I looked everywhere but it appeared there was no lock, no wait, no cpu pressure, no tempdb issues.
    When I saw that there were several exec plans for this sql, I assumed that something was up and indeed after running FREEPROCCACHE and I restarted the job, it went back to normal speed, and with a single exec query plan.

    The refresh query is old and stable for a few years. Stats are updated every week. Query is raw and does not involve procedures or parameters.

    Could you help me to understand what happened ? I'm usually for letting SQL Server decides by itself and against query hints but I don't know how to prevent this in the future.

    I understand that strong table data size variations can influence such behavior. But source table volume is stable but there is indeed weird information in the execution plan I captured :

    "Hanging" Plan properties:

    Cached plan size: 56 KB
    CompileCPU: 7
    Estimated Number of Rows: 1
    Estimated Operator Cost: 0%
    Estimated  Subtree Cost: 0,031225
    Optimization Level: FULL
    Reason for Early Termination of Statement Optimization: Good Enough Plan Found

    "Speedy" Plan  properties

    Cached plan size: 136 KB
    CompileCPU: 16
    Estimated Number of Rows: 2407180
    Estimated Operator Cost: 0%
    Estimated Subtree Cost: 245,896
    Optimization Level: FULL
    > There is also a lot of Parallelism (Gather Streams) tasks in this plan

    The source tables for the INSERT INTO SELECT are indeed large but are never empty. Why was 1 row count estimated and what is this "Early Termination of Statement Optimization" ?
    Thanks for your help and let me know if you need any additional info.

  • Was it in a "RUNNABLE" or "RUNNING" state or both.  The estimated rows does look odd.  It's a little hard to say for certain without looking at the plan what might have happened.  I would have guessed parameter sniffing, stale stats, of something along those lines, but you said there's no parameters and you update stats regularly.

    You do have one option to make sure the speedy plan always gets taken. You can use sp_create_plan_guide_from_handle to pin the plan in the cache.  You'll need to get the plan_handle during the next run.  The plan will have to be in the cache when you do this.  All this does under the covers is create an XML plan guide.

  • Even though stats are updated "every week", it could still be a stats problem.  It could also be that the data finally reached a "tipping point" and caused code that doesn't necessarily scale well (despite previous "good performance") to choose a less than optimal plan.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • sqlgrease - Friday, December 15, 2017 9:53 AM

    Was it in a "RUNNABLE" or "RUNNING" state or both.  The estimated rows does look odd.  It's a little hard to say for certain without looking at the plan what might have happened.  I would have guessed parameter sniffing, stale stats, of something along those lines, but you said there's no parameters and you update stats regularly.

    You do have one option to make sure the speedy plan always gets taken. You can use sp_create_plan_guide_from_handle to pin the plan in the cache.  You'll need to get the plan_handle during the next run.  The plan will have to be in the cache when you do this.  All this does under the covers is create an XML plan guide.

    thanks for your response. To be precise, statuses were a bit confusing: query status was "running" in sys.sysprocesses and "runnable" in sys.dm_exec_requests.
    Thanks for your tip about sp_create_plan_guide_from_handle. I will look into it.

  • Jeff Moden - Friday, December 15, 2017 10:07 AM

    Even though stats are updated "every week", it could still be a stats problem.  It could also be that the data finally reached a "tipping point" and caused code that doesn't necessarily scale well (despite previous "good performance") to choose a less than optimal plan.

    Hi, I agree, this is the most probable scenario. But I'm confused about the plan properties (Estimated Number of Rows: 1 / Reason for Early Termination of Statement Optimization: Good Enough Plan Found) as being symptoms or cause...

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply