House Keeping - Statistics

Question

Post reply

House Keeping - Statistics

NicHopper

SSCrazy Eights

Points: 9090
More actions
January 21, 2016 at 8:54 am

#323776

Hi all,
I wondered if anyone had a good formula for updating statistics for an overnight job we have, the job currently looks at index fragmentation and based on the page count and fragmentation then decides to optimise or rebuild.
Currently it then run's sp_updatestats for all the databases, I've gone and written something new which scans for statistics which have not been updated in a certain time based on a parameter it then records the details as it updates each one. The question is what sample rate to use, using 100% on a table with millions of records causing a lot of I/O and takes an age, and likewise is there much difference between 20% and 100% on a table with 50 rows? The sample rate should really be decided based on the table size in the same way auto stats update is based on the row count.
Has anyone done anything similar? If so how did you determine the sample rate? or is their a more generic formula I can use?
Any help would be appreciated.
Thanks,
Nic
Check out my blog http://www.sqlservercentral.com/articles/Best+Practices/61537/

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Leonard Rutkowski SSCrazy Points: 2668 More actions · Answer 1

My suggestion would be to use Ola Hallengrens scripts. You can find it here, https://ola.hallengren.com

Leonard

NicHopper SSCrazy Eights Points: 9090 More actions · Answer 2

Hi Leonard,

Thanks for this whilst this is an excellent script for people without anything or looking to add to existing maintenance.

From what I can see the @SampleStatistics parameter is set to a value by the executor of the procedure, it's not calculated however that got me thinking what if I just say SAMPLE but don't specify a value, interestingly I can't seem to find a definitive answer to the question "If I do a sample and don't specify a value, what value does it use?" Some answers say 20% some say SQL figures it out, which if true could be what I'm after for now, do you know?

Thanks,

Nic

Check out my blog http://www.sqlservercentral.com/articles/Best+Practices/61537/

Steve Jones - SSC Editor SSC Guru Points: 734542 More actions · Answer 3

According to this, sample size changes with table size: http://sqlperformance.com/2013/04/t-sql-queries/update-statistics-duration

It looks like you might need to experiment and then tweak your setting over time to keep performance at a high level.

NicHopper SSCrazy Eights Points: 9090 More actions · Answer 4

Hi,

Thank you both for your response, I think I'm going to go with a straight forward SAMPLE by default and let SQL decide based on the table size, but allow the option to set a sample rate for the whole database or even a specific statistic, that way I can fine tune it as time goes by.

Thanks again,

Nic

Check out my blog http://www.sqlservercentral.com/articles/Best+Practices/61537/

Alexander Suprun SSCertifiable Points: 6233 More actions · Answer 5

NicHopper (1/22/2016)
Thank you both for your response, I think I'm going to go with a straight forward SAMPLE by default and let SQL decide based on the table size, but allow the option to set a sample rate for the whole database or even a specific statistic, that way I can fine tune it as time goes by.

If you ever find that default SAMPLE generates bad statistics for some of the columns then don't play with different SAMPLE rates, just go straight to FULLSCAN. It doesn't worth the effort, and in many cases FULLSCAN is faster than SAMPLE with higher percentage (20%+), and there are multiple reasons for that:

1. If you update stats with FULLSCAN on a column that is part of any non-clustered index (which is usually the case) then server can scan small and narrow index, instead of sample-scan the table.

2. If the stats column is the 1st one in a non-clustered index then server can even avoid doing expensive sort operation whereas SAMPLE will always require sorting and consume memory and tempdb for that.

3. If the stats column is the 1st one in a clustered index then even if you have to read the whole table, so there is no much difference with SAMPLE in this case, but you still get some improvements by avoiding expensive sort operation.

Alex Suprun

Jeff Moden SSC Guru Points: 1003888 More actions · Answer 6

NicHopper (1/22/2016)
Hi,
Thank you both for your response, I think I'm going to go with a straight forward SAMPLE by default and let SQL decide based on the table size, but allow the option to set a sample rate for the whole database or even a specific statistic, that way I can fine tune it as time goes by.
Thanks again,
Nic

I recommend FULL SCAN for any index that contains any "ever increasing" column that is used as the first key especially if it's being used in JOINs.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

NicHopper SSCrazy Eights Points: 9090 More actions · Answer 7

Thank you all for the comments, using them I've built a robust customisable housekeeping script for all our statistics with suitable auditing included.

Thanks again,

Nic

Check out my blog http://www.sqlservercentral.com/articles/Best+Practices/61537/

Ed Wagner SSC Guru Points: 287024 More actions · Answer 8

Are you keeping track of how long it takes to update each one or just letting it run and tracking the overall time for the database? I'm tracking the time for a database. I never added a sample size calculation based on the row count, so you've given me something to think about.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

NicHopper SSCrazy Eights Points: 9090 More actions · Answer 9

Yes, it records the start and end time for each statistic and also the sample rate it used and the method, rows or percent.

It also allows for specifying scan details for a particular database,schema, table or statistic. So I can tweak it as needed, for example if a particular statistic or table need something different to the others.

I'd be happy to share it if you would like it.

I also have the same thing for indexes.

Thanks,

Nic

Check out my blog http://www.sqlservercentral.com/articles/Best+Practices/61537/

Ed Wagner SSC Guru Points: 287024 More actions · Answer 10

I have them, but thank you for the offer. I'm tracking the time for each database, but not for each UPDATE STATISTICS statement. I figured that isn't as important and I didn't want to spend the space to store it. The same scenario applies for index maintenance. The details haven't been necessary yet.

I think I have some work to do on both of them. The sample size for a statistics update, for example, is something I don't have in there (great idea, BTW) and I think might benefit the run.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Steve Jones - SSC Editor SSC Guru Points: 734542 More actions · Answer 11

If you'd like to share, it would be great as an article, describe how/why you built the script.

NicHopper SSCrazy Eights Points: 9090 More actions · Answer 12

I've taken so much from this site and the SQLcommunity that it be my pleasure to try and put something back by contributing.

I'll get something typed up and submitted.

Thanks,

Nic

Check out my blog http://www.sqlservercentral.com/articles/Best+Practices/61537/

Jeff Moden SSC Guru Points: 1003888 More actions · Answer 13

NicHopper (2/1/2016)
I've taken so much from this site and the SQLcommunity that it be my pleasure to try and put something back by contributing.
I'll get something typed up and submitted.
Thanks,
Nic

Very cool. Thanks, Nic.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

NicHopper SSCrazy Eights Points: 9090 More actions · Answer 14

Hi,

I've not forgotten about this, I've just been busy, I'll do my best to get an article together with scripts in the next week or so.

Sorry for the delay.

Nic

Check out my blog http://www.sqlservercentral.com/articles/Best+Practices/61537/