Statistics issues

  • Hi All,

    Looking for some clarifications on UPDATING STATISTICS.

    In our production env, we have 3 TB database with Auto_create stats option is set to true.

    Every weekend on Saturday midnight we run update stats with full scan.

    However, during the mid of the week, application developers keep complaining that few queries are running slow.

    The reason for this could be, some tables are getting updated quite often. Again, I need to confirm from the application team if that's true or not.

    Questions

    =========

    Q) Is there a way/query to pull out a report which shows all the indexes which have high modifications since last updated date?

    Q) Assuming that we got the list of tables( lets say, tables T1 & T2) and they are like 20-25GB large in size, In such cases how to tackle query performance issues?

    Running update stats with FULLSCAN at that moment is cpu intensive and might take a sometime to complete. Also running update stats with sampling may not give better performance.

    How to deal with such situations in order to guarentee better query performance for SELECT's.

    Q) Does UPDATE STATISTICS cause blocking?

    Q) In what scenarios the row sampling option is useful. Want to know use cases if when can we go for SAMPLING and what value should be used for sampling.

    Other thing is, developers has db_owner permissions on the database and they update stats whenever there is a performance isssue.

    The db_owner permissions is given due to the vendor database recommendations.

    They update stats for some tables which are used for ETL i.e. pulling data from source sql servers but they do not use FULLSCAN.

    Can anyone explain what problems can we running into on a production environment doing so. One thing I can see that all good work done over the weekend with FULLSCan is wiped out.

    Want to know what are the other side effects we can run into if they are running update stats at their own will.

    Regards,

    Sam

  • There's a lot to unpack here, but I'll try to answer some of it.

    First question, can you identify which tables have been changed between two dates? Yes. The key to this is sys.dm_db_index_usage_stats. If you capture this one Saturday and then capture it again on a second Saturday (Monday, Tuesday, whatever),  you'll see differences. That will show you how much a given index has changed. Now, this is subject to reset due to restarts, failovers, stuff like that. That means it's not a flawless measure. However, it's a quick & easy way to make this determination without setting up, for example, an Extended Events session to capture every UPDATE/INSERT/DELETE and then aggregate that data. Now, you could also look to Query Store to capture the updates and then search & aggregate from there. It may be more accurate, but it's going to be a giant pain. I'd stick with dm_db_index_usage_stats, but understand the limitations there.

    Second, how do you tackle performance issues on big tables updated frequently? Carefully. Seriously though, this is far too open-ended to give you a precise answer. I had tables that updated extremely frequently that we had to run a statistics update once every 15 minutes to ensure we were getting good stats. Now, this lead to lots of recompiles on procedures, so nothing is free. Updating statistics does use resources and it does block, a tiny amount. However, it's the recompiles that are the big pain here. So, for example, if your CPU is already maxing out, forcing a lot more recompiles on the server could be a serious problem.

    Third, already addressed blocking. Yes, but not much, recompiles.

    Fourth, what are the scenarios where you sample differently? That one is completely deterministic. Most people, most of the time, auto update of stats, which is sampled, is adequate. Some systems absolutely need a FULL SCAN done on stats to ensure as much accuracy as possible. Some systems, absolutely barf when the stats are updated using a FULL SCAN as the accuracy picks up on just how badly skewed the data is, whereas a sampled scan, being less accurate, results in better behavior. This is one is all about understanding what's happening with the compiler and how it affects your queries.

    In short, there's not a single correct answer here. However, I do, very much, believe that having different processes do different things to your stats & indexes throughout the week, is probably a recipe for disaster. While I wouldn't suggest a single 'do this and nothing else, everywhere' approach, I'd still suggest a controlled and thought out approach as opposed to Dodge City on a Saturday night.

    I hope all this helps. If there's more stuff you need, let's focus on tiny aspects of it at a time. Statistics is a very broad & complicated topic, so asking ALL the questions at once gets really hard to answer.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I would suggest starting at the following bit of MS Documentation...

    https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-stats-properties-transact-sql?view=sql-server-ver15

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you Grant & Jeff.

    • This reply was modified 3 years, 3 months ago by  vsamantha35.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply