When to update statistics manually, if at all?

  • So, I'm using my google foo to try to find some articles about why we should not be updating statistics and recompiling on a daily basis. It is pretty clear that with auto_update_statistics enabled, there is no reason whatsoever to recompile after the statistics update. However, I cannot find any information about how to tell when we should update the statistics. I would like to believe that we shouldn't have to at all, but I have to convince myself of that as well. Instead of just saying "well, statistics are created and updated manually, so we don't need to update them daily." I need some references. Google is only showing me how to do it and what it does... not how to determine when I should do it. Any help appreciated.

    Jared
    CE - Microsoft

  • SQLKnowItAll (4/8/2013)


    So, I'm using my google foo to try to find some articles about why we should not be updating statistics and recompiling on a daily basis. It is pretty clear that with auto_update_statistics enabled, there is no reason whatsoever to recompile after the statistics update. However, I cannot find any information about how to tell when we should update the statistics. I would like to believe that we shouldn't have to at all, but I have to convince myself of that as well. Instead of just saying "well, statistics are created and updated manually, so we don't need to update them daily." I need some references. Google is only showing me how to do it and what it does... not how to determine when I should do it. Any help appreciated.

    As a rule of thumbs, a 20% change of data volume or data distribution on a particular table would be a good threshold for updating performance statistics. All execution plans referencing such a table should be invalidated automatically (at least Oracle does it this way) at the time fresh statistics are gathered therefore a new execution would trigger a compilation.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • for me, it depends on the size of the table;

    for huge tables, it takes a lot less than 20% of the rows to make the statistics stale enough that they need to be refreshed manually;

    `

    I have a variation of this running thoughout the day, and don't seem to have stat problems anymore on our busier tables.

    I put this together with some arbitrary limits, based on tye size of tables , so tables with 10^8 rows have different rules than 10^3 rows

    i'm open for suggestions and improvements, of course!

    --The Analysis: my crappy assumptions:

    --UPDATE STATISTICS dbo.GMPVDET

    --tables under 1000 rows, I'll leave then at 20%

    --tables with more rows than that, I'll use an arbitrary sliding scale formula.

    --formula to be modified based on analysis

    SELECT X.*,

    ISNULL(CASE

    WHEN X.[Total Rows]<=1000

    THEN

    CASE

    WHEN [Percent Modified] >=20.0

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN --20% Small Table Rule'

    END

    WHEN [Percent Modified] = 100.00

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN --100% No real Stats Rule'

    --WHEN X.[Rows Modified] > 1000

    --THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN --1000 Rows Modified Rule'

    ELSE

    CASE

    WHEN X.[Total Rows] > 1000000000 --billion rows

    THEN CASE

    WHEN [Percent Modified] > 0.1

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 1B Big Table Rule'

    END

    WHEN X.[Total Rows] > 100000000 --hundred million rows

    THEN CASE

    WHEN [Percent Modified] > 1.0

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 100M Big Table Rule'

    END

    WHEN X.[Total Rows] > 10000000 --ten million rows

    THEN CASE

    WHEN [Percent Modified] > 2.0

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 10M Big Table Rule'

    END

    WHEN X.[Total Rows] > 1000000 --million rows

    THEN CASE

    WHEN [Percent Modified] > 5.0

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 1M Big Table Rule'

    END

    WHEN X.[Total Rows] > 100000 --hundred thousand rows

    THEN CASE

    WHEN [Percent Modified] > 10.0

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 100K Big Table Rule'

    END

    WHEN X.[Total Rows] > 10000 --ten thousand rows

    THEN CASE

    WHEN [Percent Modified] > 20.0

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 10K Big Table Rule'

    END

    END

    END,'') AS [Statistics SQL]

    FROM (

    SELECT DISTINCT

    DB_NAME() AS [Database],

    S.name AS [Schema Name],

    T.name AS [Table Name],

    I.rowmodctr AS [Rows Modified],

    P.rows AS [Total Rows],

    CASE

    WHEN I.rowmodctr > P.rows

    THEN 100

    ELSE CONVERT(decimal(8,2),((I.rowmodctr * 1.0) / P.rows * 1.) * 100.0)

    END AS [Percent Modified]

    FROM

    sys.partitions P

    INNER JOIN sys.tables T ON P.object_Id = T.object_id

    INNER JOIN sys.schemas S ON T.schema_id = S.schema_id

    INNER JOIN sysindexes I ON P.object_id = I.id

    WHERE P.index_id in (0,1)

    AND I.rowmodctr > 0

    ) X

    WHERE [Rows Modified] > 1000

    ORDER BY [Rows Modified] DESC

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I have found that you need to examine the 'noise' in each table, and update accordingly. Now, you have to go to 'huge' before it matters, but as little as 0.5% in a couple of fact tables I've dealt with have been enough to throw off the entire system.

    Here's what I've found was the easiest (conceptually). Do your standard checking for volume differences. I'm more used to older systems but the stuff above looks tasty. Then I used to keep an override table. If the table wasn't in the overrides, we used to (on weekends) restat anything at +10%. I had one table that got restatted at 0.000001 or something equally rediculous, and that was nightly.

    It all depends on your system and what you need, that's why it's so hard to get a # to nail down.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Fantastic script above! Thanks. Also, if you're using SQL 2008 R2 SP1 upwards, check out trace flag 2371 to see if its of any use.

    Cheers

  • SQLKnowItAll (4/8/2013)


    It is pretty clear that with auto_update_statistics enabled, there is no reason whatsoever to recompile after the statistics update.

    There's never any need to recompile after a stats update no matter what the auto update settings are. Updating stats invalidates all plans, so all queries will recompile afterwards anyway

    However, I cannot find any information about how to tell when we should update the statistics. I would like to believe that we shouldn't have to at all, but I have to convince myself of that as well. Instead of just saying "well, statistics are created and updated manually, so we don't need to update them daily." I need some references.

    The auto update (which updates after 20% of the table has changed) is not enough, unless you're working on a tiny DB with minimal changes. Your two options are pretty much:

    - Analyse and see what tables are prone to stale stats. This will require a lot of familiarity with the queries, checking their performance, watching for degradation, putting in manual stats updates for those tables sufficient that the queries running do not suffer from degraded performance due to stale stats.

    - Scheduled stats updates of all tables on a regular basis.

    The second is far easier and if you have the available maintenance time there's little reason not to do that. It's the times where you don't have adequate maintenance windows where you have to put in huge amounts of work to identify the queries that need manual updates.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (4/9/2013)


    SQLKnowItAll (4/8/2013)


    It is pretty clear that with auto_update_statistics enabled, there is no reason whatsoever to recompile after the statistics update.

    There's never any need to recompile after a stats update no matter what the auto update settings are. Updating stats invalidates all plans, so all queries will recompile afterwards anyway

    However, I cannot find any information about how to tell when we should update the statistics. I would like to believe that we shouldn't have to at all, but I have to convince myself of that as well. Instead of just saying "well, statistics are created and updated manually, so we don't need to update them daily." I need some references.

    The auto update (which updates after 20% of the table has changed) is not enough, unless you're working on a tiny DB with minimal changes. Your two options are pretty much:

    - Analyse and see what tables are prone to stale stats. This will require a lot of familiarity with the queries, checking their performance, watching for degradation, putting in manual stats updates for those tables sufficient that the queries running do not suffer from degraded performance due to stale stats.

    - Scheduled stats updates of all tables on a regular basis.

    The second is far easier and if you have the available maintenance time there's little reason not to do that. It's the times where you don't have adequate maintenance windows where you have to put in huge amounts of work to identify the queries that need manual updates.

    Thanks for all of the replies! I think doing it regularly works for us. We currently do it daily, but I am trying to streamline it. The stored proc created years ago does the update on statistics and then runs sp_recompile, which I think is unnecessary and since it takes a schema lock has caused deadlocks on some other processes. I am tasked with analyzing old stuff and updating it. Sample of original below run in debug mode to show the dynamic script actually executed:

    USE [DBA]

    DECLARE @mytable_id INT

    DECLARE @mytable VARCHAR(100)

    DECLARE @owner VARCHAR(128)

    DECLARE @SQL VARCHAR(256)

    SELECT @mytable_id = MIN(object_id)

    FROM sys.tables WITH(NOLOCK)

    WHERE is_ms_shipped = 0

    WHILE @mytable_id IS NOT NULL

    BEGIN

    SELECT @owner = SCHEMA_NAME(schema_id), @mytable = name

    FROM sys.tables

    WHERE object_id = @mytable_id

    SELECT @SQL = 'UPDATE STATISTICS '+ QUOTENAME(@owner) +'.' + QUOTENAME(@mytable) +''

    EXEC (@SQL)

    SELECT @SQL = 'EXEC sp_recompile '''+ QUOTENAME(@owner) +'.' + QUOTENAME(@mytable)+''''

    EXEC (@SQL)

    SELECT @mytable_id = MIN(object_id)

    FROM sys.tables WITH(NOLOCK)

    WHERE object_id > @mytable_id

    AND is_ms_shipped = 0

    END

    My thought was to remove the whole sp_recompile on this bad boy as I believe it is not needed since the update statistics will invalidate most of the plans anyway and force a recompile. Thoughts? (Also, we I want to stop running this on Monday when indexes are rebuilt on Sunday and little to 0 data has changed).

    Jared
    CE - Microsoft

  • i use a combination of the date last updated and the amount of data modified

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply