High sensitivity of SQL server 2016 to statistics ?

  • Hello all,

    we have recently upgraded from MS SQL 2012 SP3 to MS SQL 2016 SP2 (standard), we use compatibility level 130 and legacy cardinality estimator = ON (with setting to off is worse in our system from performance point of view, we have also tried).

    Since the upgrade, we have been facing extreme sensitity to accuracy of statistics even small tables with few rows (mostly <200) in our OLTP system. Most calls is done through SPs, plan rebuild is also not an issue, the issue is wrong execution plans for some queries. It was not the case with SQL 2012.

    We join quite often to these few tables and if statistics are not up2date, the query is very slow (like 5 seconds vs 2 mins).

    We use the same maintenance¨job (from Ola Hallengren), with some parameters, including statistic,  we recalculate statistisc every Sunday (we don't have 24x7, Sunday is free), With SQL 2012 it was really sufficient, while with SQL 2016 it . Auto update statistics is on. in these small tables there are only few changes a week.

    Have you met the same issue with the SQL2016 and higher ?

    Regards

     

     

  • If you have tables with fewer than 200 rows, the statistics won't be changing very often unless you're doing lots of replacements of that data, so I don't see how those estimates would be impacting your plans so much.

    The statistics are different from 2012 to 2016. The cardinality estimation engine changed in 2014. For most people, and in most circumstances, the change was either neutral or for the better. In a few cases, it was for the worse. You don't need to both set the compatibility mode down AND use the old cardinality estimation setting if all you're going for is the cardinality estimation engine. The compatibility mode turns off all the new functionality (you know, the reason you upgraded).

    This isn't the kind of behavior I've seen in systems, so I wonder if something else is going on. Weekly statistics updates might not be adequate for some of your tables (not the little ones, a few changes a week just should not have the impact you're seeing). The larger tables with volatile data sets might benefit from a more frequent statistics update. Stats updates, not index rebuilds, totally different, are relatively benign in terms of load for most people in most situations (note the weasel words). So a more frequent update of statistics can be done. I'd target specific tables though.

    The best thing would be to capture the plans before and after it's running slow and determine exactly which row estimates are causing you problems. In general, 2016 is a faster system than 2012, so if you're seeing radically different results, you have to investigate why.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey wrote:

    If you have tables with fewer than 200 rows, the statistics won't be changing very often unless you're doing lots of replacements of that data, so I don't see how those estimates would be impacting your plans so much.

    The statistics are different from 2012 to 2016. The cardinality estimation engine changed in 2014. For most people, and in most circumstances, the change was either neutral or for the better. In a few cases, it was for the worse. You don't need to both set the compatibility mode down AND use the old cardinality estimation setting if all you're going for is the cardinality estimation engine. The compatibility mode turns off all the new functionality (you know, the reason you upgraded).

    This isn't the kind of behavior I've seen in systems, so I wonder if something else is going on. Weekly statistics updates might not be adequate for some of your tables (not the little ones, a few changes a week just should not have the impact you're seeing). The larger tables with volatile data sets might benefit from a more frequent statistics update. Stats updates, not index rebuilds, totally different, are relatively benign in terms of load for most people in most situations (note the weasel words). So a more frequent update of statistics can be done. I'd target specific tables though.

    The best thing would be to capture the plans before and after it's running slow and determine exactly which row estimates are causing you problems. In general, 2016 is a faster system than 2012, so if you're seeing radically different results, you have to investigate why.

    1. 1st we use the old cardinality estimator, using the setting Legacy cardinality estimator ON Microsoft says, it uses the old algorithm.

      Do you say that MS isn't saying true ? That the estimator works differently although I use Legacy cardinality etimator ON ?

      https://blog.sqlauthority.com/2019/02/09/sql-server-enabling-older-legacy-cardinality-estimation/

    2. I have met several queries (one today) where the situation was about the same, joining various tables with some tables with hundreds thousand of rows with those with few rows (let's say hundreds or maximum few thousands) and what helped ? Update statistics SMalltable1, Update statistics SMalltable2 and Update statistics SMalltable3 and now they query runs in 5 seconds comparing to two minutes before. But I didn't have to do it with SQL 2012.

     

  • Did I misread your compatibility mode? If so, my bad, apologies. Yes, you can have the new compatibility mode enabled and then disable the new cardinality estimator.

    As to the rest, you say hundreds, then you say thousands. Each is going to behave a little differently, so which is it? It matters. Two hundred rows and 3 changes in a week, stats should be identical. Several thousand rows and 3 changes in a week, stats should still be identical. However, thousands of rows and more changes, then auto stats will fire. Row estimates will change and using the legacy cardinality estimator, they should be roughly the same.

    The question is, where and why are the estimates off. Without seeing the plans, I can't even guess. The optimizer is different from 2012 to 2016. It's not just the cardinality estimator that's different. This could lead to differences in behavior. Also, the mechanisms used to gather statistics are a little different in 2016. That also might lead to some behavioral changes. However, without seeing the details of the setup, all I could do is speculate wildly. If you at least post a before & after plan, one of the ones that especially egregious, so it's possible to evaluate what's going wrong and why, I could add more than vague suggestions. I mean, are you seeing the histogram used to arrive at row counts, or is it just using the selectivity of the statistics to arrive at them? This can change behaviors. It's just too hard to know based on too little data.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The web page refuses upload of the plan

     

     

    • This reply was modified 3 years, 11 months ago by  martinreiser.
  • we are prohibited from publishing code (our mother company uses some robots to search for, but they don¨t help us with any other thing as usuall). Bu  ican hoepfull publish plans.

    I have checked the 3 tables from the last problematic query: currently about 1100, 1200 and 140 rows.

     

    renamed to .txt and zipped.

    • This reply was modified 3 years, 11 months ago by  martinreiser.
    • This reply was modified 3 years, 11 months ago by  martinreiser.
  • Two things. First, the query is in the plan. I sure don't see anything that's even remotely proprietary, but if there's a concern, you should remove those from the post.

    Second, the "good" plan has some insanely high mismatches on estimates. The "bad" plan is very accurate on it's estimates (first blush, I haven't looked at everything). In short, the row counts are better in what's marked as the problematic plan. That's... interesting. I'm going to have to dig through them some more to fully understand what's happening and why the choices are being made, however, the one thing I'm seeing immediately is that both plans wait about 14 seconds on memory. Then, the bad plan has an additional wait on CPU (sos_scheduler_yield). However, I'm not seeing wildly disparate estimates on the tables in the bad plan. It's a 1339 row table and you return 1339 rows, for most of the stuff I've looked at. So whatever else is going on, I'm not convinced it's statistics based on my initial look.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • If you want to, get a copy of SentryOne Plan Explorer. It's free. It has a way to anonymize the plans. Post those and no code will be visible.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I have no clue how to remove the attachments. Edit doesn't work, I can change only text.

  • martinreiser wrote:

    I have no clue how to remove the attachments. Edit doesn't work, I can change only text.

    I have admin access, so I went ahead and removed them for you.

    Try again with SentryOne and the anonymized plans. It's very safe.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Here it is, from SEntry anonymized, the question is, If I did it properly, I don't know how the anomymization si random, since I did both in different instance of SEntry,

    again, plan (from restore) after update  statistics of those 3 tables <10s, before almost 3 mins

    Attachments:
    You must be logged in to view attached files.
  • Any suggestions ?

  • This needs pretty detailed analysis, and I'm sorry, I've been busy with work. I'll try to get into it again as soon as I can.

    Short answer, I suspect you're dealing more with changes to the optimizer than with changes to statistics. I assume you have a test system (please, don't do what I'm about to suggest on production without testing it). Run the query with the compatibility mode set to 11 or 12 instead of 13.

    If you're trying to track stuff down on your own though, focus on the differences in the row estimates. Use the plan comparison tool in SSMS. It shows common operators and different operators, or even differences in operators that are the same overall. Look to the row counts. What I saw initially was that the row counts were more accurate on what you called the slower plan. I'm not sure why and I'm not sure I saw the whole picture.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • martinreiser wrote:

    Any suggestions ?

    I think you missed the suggestion about reverting to the old Cardinality Estimator, which DID fix the same problems we were having when we made the same shift from 2012 to 2016.  For a quick read, read about Trace Flag 9481 at the following link.  I have this trace flag enabled on the startups of my SQL Servers because there was just too much to fix.

    https://logicalread.com/3-sql-ssql-server-trace-flags-troubleshooting-pd01/#.XsaJH8B7mUk

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have search the internet from what I understand, the trace flag has no impact on SQL 2016, and https://stackoverflow.com/questions/42630320/compatibility-level-upgrade-from-100-to-130-query-optimizer-fixes-flag

    For the article https://logicalread.com/3-sql-ssql-server-trace-flags-troubleshooting-pd01/#.XseBbsAzXmF

    there is no date and SQL 2016 and higher are not mentioned

    Bu I can test it.

    For compatibility level, we had mainly performance issues not related to upgrade to 2016, but after switching database compatibility level to 130, setting Legacy cardinality estimator removed some issues, but some have remained.

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply