More problems with SQL 2014's new Cardinality Estimator

  • We have an established SQL-based product which has been running just fine (with the occasional problem!) on SQL 2005 ... 2012 for some time. However, tests on SQL 2014 showed an immediate performance car crash; a slightly complicated insert into a table went from approximately 0 seconds to 20 minutes or more. It seems that getting the dataset to insert isn't a problem - if the code is doctored to simply return the dataset (all 15 rows of it!), it performs just fine. But if you try to put than in to a table (either normal or a # table), performance dives.

    I've attached two scripts to illustrate the problem. (They're really sql files but I've had to change the type to upload.) If you create a new, empty database on a 2014 server, then run the first script (which makes a few tables and puts a bit of data in) and then the second, you should experience the slowness. (If you do the same on a 2012 server, everything is OK).

    The execution plan (if you have enough time to wait for one) includes a lot of warnings on the final insert (for which the cost is apparently 0%) of type PlanAffectingConvert with details along the lines of:
    ConvertIssue = CardinalityEstimate
    Expression = CONVERT(varchar(3500),[D].[FundingTypeID],0)
    (The plan is 8MB, which is probably too large to upload.)

    I know it's a bit of a tall order, but I was hoping this might be a common sort of thing with a relatively simple solution (which doesn't involve "OPTION (QUERYTRACEON 9481)"!)

    Thanks in advance.

  • It runs on my 2016 instance in 19 seconds, and I don't have a 2014 instance around. Should I see if I can fix it on 2016?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • julian.fletcher - Wednesday, June 28, 2017 10:17 AM

    The execution plan (if you have enough time to wait for one) includes a lot of warnings on the final insert (for which the cost is apparently 0%) of type PlanAffectingConvert with details ....

    The 'final' operator doesn't  do anything, so its cost will be 0, the warnings on it are for the entire plan in general.
    In the 2016 plan, 100% of the cost is in the insert into the temp table, which is probably not wrong, though 19 seconds to insert 15 rows is a bit off.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I took a quick look at the tables themselves and I always hesitate when I see a complex computed field on which a unique index has been imposed.  That could very well be where the cause of the problem stems from.   I avoid that kind of table design because it's a rather painful dependency on the nature of the data in the table.   All I can say to it, though, is, "just because something is possible, doesn't necessarily mean it's a good idea".

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • GilaMonster - Wednesday, June 28, 2017 11:57 AM

    julian.fletcher - Wednesday, June 28, 2017 10:17 AM

    The execution plan (if you have enough time to wait for one) includes a lot of warnings on the final insert (for which the cost is apparently 0%) of type PlanAffectingConvert with details ....

    The 'final' operator doesn't  do anything, so its cost will be 0, the warnings on it are for the entire plan in general.
    In the 2016 plan, 100% of the cost is in the insert into the temp table, which is probably not wrong, though 19 seconds to insert 15 rows is a bit off.

    If you could, that would be bloomin' marvellous! I still can't see why there's such a problem putting the 15 rows in to a table where there's apparently no problem in making the 15 row dataset in the first place. Anyway, I eagerly await your words of wisdom ...

  • OPTION(RECOMPILE)

    On SQL 2016:

    What  lead me to try that was the estimated rows on the insert. 350 000 rows, which would probably lead SQL to allocate extra memory, or other resources to account for the estimated 3.6GB of data that it was expecting.

    Probably a result of the ORs in the WHERE clause, maybe mixed with the FULL OUTER JOIN. Not completely sure.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • sgmunson - Wednesday, June 28, 2017 12:04 PM

    I took a quick look at the tables themselves and I always hesitate when I see a complex computed field on which a unique index has been imposed.  That could very well be where the cause of the problem stems from.   I avoid that kind of table design because it's a rather painful dependency on the nature of the data in the table.   All I can say to it, though, is, "just because something is possible, doesn't necessarily mean it's a good idea".

    Do you mean the {Thing}IDForDeletionValidation and {Thing}IDForIncomeValidation ones? I don't think they can be the cause of this particular problem, but if you would like to suggest an alternative design I would be very interested in hearing it. The {Thing}IDForDeletionValidation ones are a way of ensuring that in a parent-child relationship, in which both parent and child can be soft-deleted by means of a Deleted flag, we can have:

    • Neither soft-deleted.
    • Both soft-deleted.
    • Only Child soft-deleted.
    • But not only Parent soft-deleted.

    We used to enforce this via triggers (where the logic has to be repeated in the triggers on both tables) but, in an attempt to reduce the amount of code in triggers, we refactored using FKs between persisted (obviously) computed columns.

    If there's a better way, we'd love to do it!

  • GilaMonster - Wednesday, June 28, 2017 12:18 PM

    OPTION(RECOMPILE)

    On SQL 2016:
    

    What  lead me to try that was the estimated rows on the insert. 350 000 rows, which would probably lead SQL to allocate extra memory, or other resources to account for the estimated 3.6GB of data that it was expecting.

    Probably a result of the ORs in the WHERE clause, maybe mixed with the FULL OUTER JOIN. Not completely sure.

    Thanks very much for that. I'm now even more befuddled than usual. What exactly are we recompiling, if the code has never previously run? Or are we forcing a recompile of some sort of preliminary execution plan?

    I've tried on the real database (of which the attached scripts are just an excerpt) and this does result in an execution time of about 4 minutes, down from about 24. (Reminder; in 2012 compatibility mode, it's well under a second.)

    Unfortunately, I've been unable to create an excerpt which takes 24 minutes which suggests there's something in the real database I'm missing which might be The Important Thing. I'll keep looking but in the meantime, we're going to have to stick with our policy that customers with 2014 or later SQL Servers have to run our database at 2012 compatibility level, which is a bit embarrassing.

    Thanks very much for your help.

  • julian.fletcher - Thursday, June 29, 2017 3:27 AM

    GilaMonster - Wednesday, June 28, 2017 12:18 PM

    OPTION(RECOMPILE)

    On SQL 2016:
    

    What  lead me to try that was the estimated rows on the insert. 350 000 rows, which would probably lead SQL to allocate extra memory, or other resources to account for the estimated 3.6GB of data that it was expecting.

    Probably a result of the ORs in the WHERE clause, maybe mixed with the FULL OUTER JOIN. Not completely sure.

    Thanks very much for that. I'm now even more befuddled than usual. What exactly are we recompiling, if the code has never previously run? Or are we forcing a recompile of some sort of preliminary execution plan?

    Recompile does a few things.

    The first, and most obvious is recompile the execution plan. It doesn't get cached and so gets created fresh on every execution.
    It also allows better row estimations as the estimations are made right as the statement starts to run, not at the start of the batch as is normal (plans are generated for a batch). It also allows for variables' values to be sniffed
    A less obvious thing is that it relaxes some of the optimiser's rules for generating plans. Normally plans MUST be safe for reuse (must return correct results for any input/parameters/row counts). With recompile, because the plans will never be reused, they don't have to be safe for reuse. This can result in more optimal plans in some cases.

    For you, it's going to be either 2 or 3 that's helping here.

    Can you post the actual execution plan from the real database?

    Another option, if SQL 2016 is on the cards, is that SP1 introduced the LEGACY_CARDINALITY_ESTIMATOR hint that, unlike QUERYTRACEON, does not require sysadmin permissions. I've used that for a couple cases that proved rather recalcitrant.
    .

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ahh, that does make sense; I hadn't realised those aspects, particularly the reuse one.

    I've attached the real execution plans, both for 2012 and 2014. These are for the code from which I extracted the previous attachments, so there are a few differences, probably the most important of which is that we're inserting in to a table called TAudit_History rather than #Final. (Since my original post, I have made new versions of the attached scripts which create, pre-populate and write to this table. The performance is still in the tens of seconds that the original scripts gave and not the tens of minutes in the real database. So there must be something in the real database I'm missing which is important. Still looking!)

    The TAudit_History initially starts with 44096 rows in it. As I mentioned above, if you comment out the

    Insert Into dbo.TAudit_History (TableID, FunctionalAreaID, EntityID, PrimaryKeyID, EntityDescription, ColumnID, [User], [Date], OldValue, OldValueDescription, NewValue, NewValueDescription)

    line, it runs in no time at all. The 2014 plan has some eager spools which I wonder might be inappropriate and also wildly inflated number of row estimates (which the 2012 doesn't have).

    Anyway, thanks again for your help. I'll buy you a drink next time you're in Oxford.

  • It's the elevated row estimations that will likely be the cause. The spools show up when the optimiser thinks it's better to cache intermediate results and read from the spool, rather than re-calculate.
    They shouldn't be around for small row counts, as there's an overhead to write data to the spool.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • julian.fletcher - Thursday, June 29, 2017 2:06 AM

    sgmunson - Wednesday, June 28, 2017 12:04 PM

    I took a quick look at the tables themselves and I always hesitate when I see a complex computed field on which a unique index has been imposed.  That could very well be where the cause of the problem stems from.   I avoid that kind of table design because it's a rather painful dependency on the nature of the data in the table.   All I can say to it, though, is, "just because something is possible, doesn't necessarily mean it's a good idea".

    Do you mean the {Thing}IDForDeletionValidation and {Thing}IDForIncomeValidation ones? I don't think they can be the cause of this particular problem, but if you would like to suggest an alternative design I would be very interested in hearing it. The {Thing}IDForDeletionValidation ones are a way of ensuring that in a parent-child relationship, in which both parent and child can be soft-deleted by means of a Deleted flag, we can have:

    • Neither soft-deleted.
    • Both soft-deleted.
    • Only Child soft-deleted.
    • But not only Parent soft-deleted.

    We used to enforce this via triggers (where the logic has to be repeated in the triggers on both tables) but, in an attempt to reduce the amount of code in triggers, we refactored using FKs between persisted (obviously) computed columns.

    If there's a better way, we'd love to do it!

    I guess I just don't understand the use case that requires this.   Foreign Keys exist to enforce a relationship, which you are effectively over-riding and creating a means of  soft-deleting.  Why bother to have the Foreign Key in the first place?   What would you do if you made a mistake in tying one record to another, and needed to change that relationship to not actually exist?  If knowing the history of the data changes is important, you can always create a history table and use an AFTER DELETE trigger to populate it.   That lets the foreign key constraints do what they need to do, and doesn't require any calculated fields.   Soft-deleting records tends to gum up the table with no longer needed data that is better kept in a history table.   Just my two cents...

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • No time to dig into the details here, but every single time I have seen soft-deletes (or active flags, et al) at clients over the years if said fields weren't included in the vast majority of nonclustered indexes many queries that had previously been covered were now crazy inefficient because they were forced to go to the base table just to get said flag(s). Another thing to consider when you have them in play.

    Also, I see talk of ORs and bad estimates. Is it possible to use a temp table to cut the rows to said low number early then join to that for the remainder of the process?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Both of those execution plans indicate that SQL Server baulked at the task of generating a decent plan from such a complex query (Reason For Early Termination Of Statement Optimization = Time Out) so I'd vote for a reconfiguration of the query, something more likely to result in a stable plan - which very often equates to a more efficient plan. Something like this, perhaps:

    SELECT
     d.*,
     [AgeBandPricingScheme] = old10.[AgeBandPricingScheme],
     [CostCentre] = Old30.[CostCentre],
     [VATRate] = Old90.[VATRate],
     [FinancialReturnSubcategory] = Old110.[FinancialReturnSubcategory],
     [Frequency] = Old120.[Frequency],
     [FundingType] = Old130.[FundingType],
     [Organisation] = Old150.[Organisation],
     [ServiceGroup] = Old220.[ServiceGroup],
     [Subjective] = Old250.[Subjective],
     [Unit] = Old260.[Unit]
    INTO #AugmentedDeleted
    FROM #Deleted d
    Left Join dbo.TRef_AgeBandPricingScheme Old10 With (NoLock) On (Old10.[AgeBandPricingSchemeID] = D.[AgeBandPricingSchemeID])
    Left Join dbo.TRef_CostCentre Old30 With (NoLock) On (Old30.[CostCentreID] = D.[CostCentreID])
    Left Join dbo.TRefSys_VATRate Old90 With (NoLock) On (Old90.[VATRateID] = D.[DefaultVATRateID])
    Left Join dbo.TRefSys_FinancialReturnSubcategory Old110 With (NoLock) On (Old110.[FinancialReturnSubcategoryID] = D.[FinancialReturnSubcategoryID])
    Left Join dbo.TRefSys_Frequency Old120 With (NoLock) On (Old120.[FrequencyID] = D.[FrequencyID])
    Left Join dbo.TRef_FundingType Old130 With (NoLock) On (Old130.[FundingTypeID] = D.[FundingTypeID])
    Left Join dbo.T_Organisation Old150 With (NoLock) On (Old150.[OrganisationID] = D.[OrganisationID])
    Left Join dbo.TRef_ServiceGroup Old220 With (NoLock) On (Old220.[ServiceGroupID] = D.[ServiceGroupID])
    Left Join dbo.TRef_Subjective Old250 With (NoLock) On (Old250.[SubjectiveIDForIncomeValidation] = D.[SubjectiveID])
    Left Join dbo.TRef_Unit Old260 With (NoLock) On (Old260.[UnitID] = D.[UnitID])

     

    SELECT
     i.*,
     [AgeBandPricingScheme] = New10.[AgeBandPricingScheme],
     [CostCentre] = New30.[CostCentre],
     [VATRate] = New90.[VATRate],
     [FinancialReturnSubcategory] = New110.[FinancialReturnSubcategory],
     [Frequency] = New120.[Frequency],
     [FundingType] = New130.[FundingType],
     [Organisation] = New150.[Organisation],
     [ServiceGroup] = New220.[ServiceGroup],
     [Subjective] = New250.[Subjective],
     [Unit] = New260.[Unit]
    INTO #AugmentedInserted
    FROM #Inserted i
    Left Join dbo.TRef_AgeBandPricingScheme New10 With (NoLock) On (New10.[AgeBandPricingSchemeID] = i.[AgeBandPricingSchemeID])
    Left Join dbo.TRef_CostCentre New30 With (NoLock) On (New30.[CostCentreID] = i.[CostCentreID])
    Left Join dbo.TRefSys_VATRate New90 With (NoLock) On (New90.[VATRateID] = i.[DefaultVATRateID])
    Left Join dbo.TRefSys_FinancialReturnSubcategory New110 With (NoLock) On (New110.[FinancialReturnSubcategoryID] = i.[FinancialReturnSubcategoryID])
    Left Join dbo.TRefSys_Frequency New120 With (NoLock) On (New120.[FrequencyID] = i.[FrequencyID])
    Left Join dbo.TRef_FundingType New130 With (NoLock) On (New130.[FundingTypeID] = i.[FundingTypeID])
    Left Join dbo.T_Organisation New150 With (NoLock) On (New150.[OrganisationID] = i.[OrganisationID])
    Left Join dbo.TRef_ServiceGroup New220 With (NoLock) On (New220.[ServiceGroupID] = i.[ServiceGroupID])
    Left Join dbo.TRef_Subjective New250 With (NoLock) On (New250.[SubjectiveIDForIncomeValidation] = i.[SubjectiveID])
    Left Join dbo.TRef_Unit New260 With (NoLock) On (New260.[UnitID] = i.[UnitID])

    SELECT
    ...
    FROM #AugmentedInserted i
    FULL OUTER JOIN #AugmentedDeleted i On (i.ServiceLevelID = i.ServiceLevelID)
    JOIN sys.columns SC On (SC.Object_ID = @ObjectID)
    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work - Thursday, June 29, 2017 8:43 AM

    Both of those execution plans indicate that SQL Server baulked at the task of generating a decent plan from such a complex query (Reason For Early Termination Of Statement Optimization = Time Out) so I'd vote for a reconfiguration of the query, something more likely to result in a stable plan - which very often equates to a more efficient plan. Something like this, perhaps:

    SELECT
     d.*,
     [AgeBandPricingScheme] = old10.[AgeBandPricingScheme],
     [CostCentre] = Old30.[CostCentre],
     [VATRate] = Old90.[VATRate],
     [FinancialReturnSubcategory] = Old110.[FinancialReturnSubcategory],
     [Frequency] = Old120.[Frequency],
     [FundingType] = Old130.[FundingType],
     [Organisation] = Old150.[Organisation],
     [ServiceGroup] = Old220.[ServiceGroup],
     [Subjective] = Old250.[Subjective],
     [Unit] = Old260.[Unit]
    INTO #AugmentedDeleted
    FROM #Deleted d
    Left Join dbo.TRef_AgeBandPricingScheme Old10 With (NoLock) On (Old10.[AgeBandPricingSchemeID] = D.[AgeBandPricingSchemeID])
    Left Join dbo.TRef_CostCentre Old30 With (NoLock) On (Old30.[CostCentreID] = D.[CostCentreID])
    Left Join dbo.TRefSys_VATRate Old90 With (NoLock) On (Old90.[VATRateID] = D.[DefaultVATRateID])
    Left Join dbo.TRefSys_FinancialReturnSubcategory Old110 With (NoLock) On (Old110.[FinancialReturnSubcategoryID] = D.[FinancialReturnSubcategoryID])
    Left Join dbo.TRefSys_Frequency Old120 With (NoLock) On (Old120.[FrequencyID] = D.[FrequencyID])
    Left Join dbo.TRef_FundingType Old130 With (NoLock) On (Old130.[FundingTypeID] = D.[FundingTypeID])
    Left Join dbo.T_Organisation Old150 With (NoLock) On (Old150.[OrganisationID] = D.[OrganisationID])
    Left Join dbo.TRef_ServiceGroup Old220 With (NoLock) On (Old220.[ServiceGroupID] = D.[ServiceGroupID])
    Left Join dbo.TRef_Subjective Old250 With (NoLock) On (Old250.[SubjectiveIDForIncomeValidation] = D.[SubjectiveID])
    Left Join dbo.TRef_Unit Old260 With (NoLock) On (Old260.[UnitID] = D.[UnitID])

     

    SELECT
     i.*,
     [AgeBandPricingScheme] = New10.[AgeBandPricingScheme],
     [CostCentre] = New30.[CostCentre],
     [VATRate] = New90.[VATRate],
     [FinancialReturnSubcategory] = New110.[FinancialReturnSubcategory],
     [Frequency] = New120.[Frequency],
     [FundingType] = New130.[FundingType],
     [Organisation] = New150.[Organisation],
     [ServiceGroup] = New220.[ServiceGroup],
     [Subjective] = New250.[Subjective],
     [Unit] = New260.[Unit]
    INTO #AugmentedInserted
    FROM #Inserted i
    Left Join dbo.TRef_AgeBandPricingScheme New10 With (NoLock) On (New10.[AgeBandPricingSchemeID] = i.[AgeBandPricingSchemeID])
    Left Join dbo.TRef_CostCentre New30 With (NoLock) On (New30.[CostCentreID] = i.[CostCentreID])
    Left Join dbo.TRefSys_VATRate New90 With (NoLock) On (New90.[VATRateID] = i.[DefaultVATRateID])
    Left Join dbo.TRefSys_FinancialReturnSubcategory New110 With (NoLock) On (New110.[FinancialReturnSubcategoryID] = i.[FinancialReturnSubcategoryID])
    Left Join dbo.TRefSys_Frequency New120 With (NoLock) On (New120.[FrequencyID] = i.[FrequencyID])
    Left Join dbo.TRef_FundingType New130 With (NoLock) On (New130.[FundingTypeID] = i.[FundingTypeID])
    Left Join dbo.T_Organisation New150 With (NoLock) On (New150.[OrganisationID] = i.[OrganisationID])
    Left Join dbo.TRef_ServiceGroup New220 With (NoLock) On (New220.[ServiceGroupID] = i.[ServiceGroupID])
    Left Join dbo.TRef_Subjective New250 With (NoLock) On (New250.[SubjectiveIDForIncomeValidation] = i.[SubjectiveID])
    Left Join dbo.TRef_Unit New260 With (NoLock) On (New260.[UnitID] = i.[UnitID])

    SELECT
    ...
    FROM #AugmentedInserted i
    FULL OUTER JOIN #AugmentedDeleted i On (i.ServiceLevelID = i.ServiceLevelID)
    JOIN sys.columns SC On (SC.Object_ID = @ObjectID)

    Definitely worth trying.
    The last query, the one that does the FULL OUTER JOIN then gets the WHERE clause that checks for changes, and probably still the recompile hint to be safe (though do test without as well).

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply