Execution plan flips from "good" to "bad" at a certain threshold

  • We've got a particular stored procedure that's been a perennial problem performance-wise since it was written (in 2009). It's in my crosshairs once again and I've at least found out what appears to be going on but I'm at a loss in terms of bending it to my will.

    Here's the query:

    select 
    CTR.WorkingID,
    CTR.WorkingID2,
    DTIRC.Quantity,
    DTIRC.Cost,
    D.[Date],
    CTR.[ProcessSuspensionsForCpliID],
    CTR.InTerm
    from dbo.TTemp_CalculateTimetabledValue CTR -- 97,972 rows.
    join dbo.T_DeliveryTimetableItem DTI on CTR.DeliveryTimetableID = DTI.DeliveryTimetableID -- 16,150,921 rows.
    join dbo.T_DeliveryTimetableItemResolvedCost DTIRC on DTI.DeliveryTimetableItemID = DTIRC.DeliveryTimetableItemID -- 21,780,327 rows.
    join dbo.TSys_Lookup_Date D on D.[Date] Between CTR.StartDate and CTR.EndDate
    and D.[Date] >= DTIRC.StartDate and (DTIRC.EndDate is null or D.[Date] <= DTIRC.EndDate)
    and DTI.DayOfWeekID = D.DayOfWeekID

    I've included the row counts of the main tables. TSys_Lookup_Date has enough dates to cover the ranges in the TTemp_CalculateTimetabledValue table.

    I've been studying what's going on by replacing "from dbo.TTemp_CalculateTimetabledValue CTR" with "from (select top 2 percent * from dbo.TTemp_CalculateTimetabledValue) CTR" and have found that there's a change that starts at 9 percent and is complete at 10 percent. See attachments.

    If you scale up the time that was taken at 8% to 100%, you get 77.5k CPU units whereas if you scale up the 10% figure you get 1140k units - which is to say the query has suddenly got nearly 15 times slower.

    The obvious difference is the number of rows read from the T_DeliveryTimetableItemResolvedCost table, which goes from 20k to 21 million (and a seek to a scan, probably unsurprisingly). The table does have an index on the DeliveryTimetableItemID column.

    My initial thought was that the structure of the plans must be flipping, which might have suggested a "option (force order)" type of approach, but the structure appears to be unchanged across all percentages.

    All that leaves me with is some sort of chunking approach, which is obviously a pretty desperate option. So I was wondering if the very clever people at SSC might have a better idea. I've also included the execution plans.

    Thanks for your time.

    Attachments:
    You must be logged in to view attached files.
  • The only attachments that survived are just pictures. They're not much use to look at execution plans. Try posting them as .txt files. I think that works.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Ahh! It doesn't seem to like .txt files but thank goodness for good old definitely safe as houses .zip files!

  • julian.fletcher wrote:

    The obvious difference is the number of rows read from the T_DeliveryTimetableItemResolvedCost table, which goes from 20k to 21 million (and a seek to a scan, probably unsurprisingly). The table does have an index on the DeliveryTimetableItemID column.

    How is that table clustered?  A non-clustered index opens you up to the classic "tipping point" problem, which means you'll be frustratingly futzing with this the rest of your life.

    If possible, see if you can cluster the table to how it's (almost) always looked up from.  (Yes, even at the cost of a bit of fragmentation (gasp!).)  Typically you'll see a gain in performance across many queries and you can usually drop multiple non-clus indexes on the table (that start with the now-clustering key).

    That tends also to increase the effects of page compression, which you should also evaluate if you're not already using that.  Proc sys.sp_estimate_data_compression_savings can give you a good idea whether and how much it would help.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • T_DeliveryTimetableItemResolvedCost has its clustered index on the DeliveryTimetableItemResolvedCostID column, not on DeliveryTimetableItemID.

    Unfortunately, we have a bit of a standard, which is that tables called "Thing" have a clustered index on ThingID, which will be an int, identity column. That's because we usually access rows via ThingID.

    T_DeliveryTimetableItemResolvedCost is a derived table so perhaps it's just not appropriate for it to follow this standard.

    I'll change it and see if that helps. Thanks very much for your suggestion.

  • julian.fletcher wrote:

    Unfortunately, we have a bit of a standard, which is that tables called "Thing" have a clustered index on ThingID, which will be an int, identity column. That's because we usually access rows via ThingID.

    T_DeliveryTimetableItemResolvedCost is a derived table so perhaps it's just not appropriate for it to follow this standard.

    Hugely "unfortunate", aka horribly misguided.  The single biggest myth in table design is that (almost) every table should be clustered on an identity column.  It's disastrous for overall performance.  The big majority of tables, in fact, should be clustered first on something other than an identity column.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • A couple things to consider. First, In SSMS 2019 the actual execution plan will show you how much time each portion of the query took. You don't need that here because the problem is pretty obvious.

    Note that the 10% plan is performing a merge join where the first two are using a nested loop join. This is double-bad because, not only is the merge join wrecking things, the optimizer also has to perform a small sort to prepare for the merge join.  I generally avoid query hints but the fastest fix might be to force a nested loop join  like this:

    Old:

    join dbo.T_DeliveryTimetableItemResolvedCost DTIRC on DTI.DeliveryTimetableItemID = DTIRC.DeliveryTimetableItemID

    New:

    INNER LOOP join dbo.T_DeliveryTimetableItemResolvedCost DTIRC 
    on DTI.DeliveryTimetableItemID = DTIRC.DeliveryTimetableItemID

    Note, too, that I'm pretty sure the join above is the culprit but you can confirm with a little testing - e.g. check the estimated execution plan before executing the query.

    Last - if the 10% problem is related to the cardinality estimations, you can also consider dbo.many by Adam Machanic.

    • This reply was modified 3 years, 5 months ago by  Alan Burstein. Reason: Added another option - dbo.many
    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • You can use Query Store feature to get stats when the query plan switches. Also you can force a good plan using the Query Store.

    https://docs.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store?view=sql-server-ver15

  • Thanks Alan. I was very excited to try your suggestions - particularly the OPTIMIZE FOR one - but unfortunately, neither seem to help.

    So for the time being, I'm having to go with the chunking option as nothing else is as fast. Urgh.

  • Just curious - but what if you switched the join to the date lookup table from an inner join to a cross apply?

    cross apply (
    Select *
    From dbo.TSys_Lookup_Date LD
    Where LD.[Date] Between CTR.StartDate and CTR.EndDate
    And LD.[Date] >= DTIRC.StartDate
    And LD.[Date] <= coalesce(DTIRC.EndDate, '99991231')
    And LD.DayOfWeekID = DTI.DayOfWeekID
    ) As D

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Try better clustering in a test system somewhere.  You might be pleasantly astounded by the results.

    Table dbo.TSys_Lookup_Date D should almost certainly be clustered first on [Date].

    Try clustering dbo.T_DeliveryTimetableItem and dbo.T_DeliveryTimetableItemResolvedCost first on DeliveryTimetableID.  You can add $IDENTITY as the last key in order to specify a UNIQUE index (SQL really likes UNIQUE clus indexes).

    That clus also wouldn't hurt for dbo.TTemp_CalculateTimetabledValue but it's not likely to be as critical there.  Still, why not, if you're doing the others, you may end up with MERGE joins to other two tables then.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thanks for the suggestion Jeffrey. Doesn't seem to make that much difference - see attached estimated execution plans (original on left, new on right).

    While either of those are running, there's little disk access on the SQL Server - it's all CPU. But I suppose that's to be expected; SQL has grabbed all it needs from T_DeliveryTimetableItemResolvedCost and then just needs to work out whether "D.[Date] >= DTIRC.StartDate and (DTIRC.EndDate is null or D.[Date] <= DTIRC.EndDate)" is true for all the records.

    Attachments:
    You must be logged in to view attached files.
  • Scott, thanks for your suggestions. You're obviously a fan of clustered indexes! I've put clustered indexes on the columns being used in the query and it doesn't seem to have made much of a difference.

    So I'll be going with the chunking option. That works and doesn't require changes to the schema that might be difficult for us to fit in with the rest of the system.

  • Hmm, unless your data involves most of the table, I would think SQL could use a MERGE join rather than the HASH join for all those rows.  Would you be willing to show me the DDL for the new tables you built?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • For the ~100K rows in the tmp table, what percent range of the DeliveryTimetableID values in the larger tables do they represent?

    That is, if the min/max values of DeliveryTimetableID in the tmp table were, say, 2150000/2285000, what percent does that range of values represent in the DTI table, with ~16M rows?  Does that range spread over 5% of the table, 10%, 20%, what?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply