House Keeping - Statistics

  • Hi all,

    I wondered if anyone had a good formula for updating statistics for an overnight job we have, the job currently looks at index fragmentation and based on the page count and fragmentation then decides to optimise or rebuild.

    Currently it then run's sp_updatestats for all the databases, I've gone and written something new which scans for statistics which have not been updated in a certain time based on a parameter it then records the details as it updates each one. The question is what sample rate to use, using 100% on a table with millions of records causing a lot of I/O and takes an age, and likewise is there much difference between 20% and 100% on a table with 50 rows? The sample rate should really be decided based on the table size in the same way auto stats update is based on the row count.

    Has anyone done anything similar? If so how did you determine the sample rate? or is their a more generic formula I can use?

    Any help would be appreciated.

    Thanks,

    Nic

  • My suggestion would be to use Ola Hallengrens scripts. You can find it here, https://ola.hallengren.com

    Leonard

  • Hi Leonard,

    Thanks for this whilst this is an excellent script for people without anything or looking to add to existing maintenance.

    From what I can see the @SampleStatistics parameter is set to a value by the executor of the procedure, it's not calculated however that got me thinking what if I just say SAMPLE but don't specify a value, interestingly I can't seem to find a definitive answer to the question "If I do a sample and don't specify a value, what value does it use?" Some answers say 20% some say SQL figures it out, which if true could be what I'm after for now, do you know?

    Thanks,

    Nic

  • According to this, sample size changes with table size: http://sqlperformance.com/2013/04/t-sql-queries/update-statistics-duration

    It looks like you might need to experiment and then tweak your setting over time to keep performance at a high level.

  • Hi,

    Thank you both for your response, I think I'm going to go with a straight forward SAMPLE by default and let SQL decide based on the table size, but allow the option to set a sample rate for the whole database or even a specific statistic, that way I can fine tune it as time goes by.

    Thanks again,

    Nic

  • NicHopper (1/22/2016)


    Thank you both for your response, I think I'm going to go with a straight forward SAMPLE by default and let SQL decide based on the table size, but allow the option to set a sample rate for the whole database or even a specific statistic, that way I can fine tune it as time goes by.

    If you ever find that default SAMPLE generates bad statistics for some of the columns then don't play with different SAMPLE rates, just go straight to FULLSCAN. It doesn't worth the effort, and in many cases FULLSCAN is faster than SAMPLE with higher percentage (20%+), and there are multiple reasons for that:

    1. If you update stats with FULLSCAN on a column that is part of any non-clustered index (which is usually the case) then server can scan small and narrow index, instead of sample-scan the table.

    2. If the stats column is the 1st one in a non-clustered index then server can even avoid doing expensive sort operation whereas SAMPLE will always require sorting and consume memory and tempdb for that.

    3. If the stats column is the 1st one in a clustered index then even if you have to read the whole table, so there is no much difference with SAMPLE in this case, but you still get some improvements by avoiding expensive sort operation.


    Alex Suprun

  • NicHopper (1/22/2016)


    Hi,

    Thank you both for your response, I think I'm going to go with a straight forward SAMPLE by default and let SQL decide based on the table size, but allow the option to set a sample rate for the whole database or even a specific statistic, that way I can fine tune it as time goes by.

    Thanks again,

    Nic

    I recommend FULL SCAN for any index that contains any "ever increasing" column that is used as the first key especially if it's being used in JOINs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you all for the comments, using them I've built a robust customisable housekeeping script for all our statistics with suitable auditing included.

    Thanks again,

    Nic

  • Are you keeping track of how long it takes to update each one or just letting it run and tracking the overall time for the database? I'm tracking the time for a database. I never added a sample size calculation based on the row count, so you've given me something to think about.

  • Yes, it records the start and end time for each statistic and also the sample rate it used and the method, rows or percent.

    It also allows for specifying scan details for a particular database,schema, table or statistic. So I can tweak it as needed, for example if a particular statistic or table need something different to the others.

    I'd be happy to share it if you would like it.

    I also have the same thing for indexes.

    Thanks,

    Nic

  • I have them, but thank you for the offer. I'm tracking the time for each database, but not for each UPDATE STATISTICS statement. I figured that isn't as important and I didn't want to spend the space to store it. The same scenario applies for index maintenance. The details haven't been necessary yet.

    I think I have some work to do on both of them. The sample size for a statistics update, for example, is something I don't have in there (great idea, BTW) and I think might benefit the run.

  • If you'd like to share, it would be great as an article, describe how/why you built the script.

  • I've taken so much from this site and the SQLcommunity that it be my pleasure to try and put something back by contributing.

    I'll get something typed up and submitted.

    Thanks,

    Nic

  • NicHopper (2/1/2016)


    I've taken so much from this site and the SQLcommunity that it be my pleasure to try and put something back by contributing.

    I'll get something typed up and submitted.

    Thanks,

    Nic

    Very cool. Thanks, Nic.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi,

    I've not forgotten about this, I've just been busy, I'll do my best to get an article together with scripts in the next week or so.

    Sorry for the delay.

    Nic

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply