When to update statistics manually, if at all?

Question

When to update statistics manually, if at all?

Jared

SSC Guru

Points: 61793
More actions
April 8, 2013 at 1:59 pm

#291925

So, I'm using my google foo to try to find some articles about why we should not be updating statistics and recompiling on a daily basis. It is pretty clear that with auto_update_statistics enabled, there is no reason whatsoever to recompile after the statistics update. However, I cannot find any information about how to tell when we should update the statistics. I would like to believe that we shouldn't have to at all, but I have to convince myself of that as well. Instead of just saying "well, statistics are created and updated manually, so we don't need to update them daily." I need some references. Google is only showing me how to do it and what it does... not how to determine when I should do it. Any help appreciated.
Jared
CE - Microsoft

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 1

SQLKnowItAll (4/8/2013)
So, I'm using my google foo to try to find some articles about why we should not be updating statistics and recompiling on a daily basis. It is pretty clear that with auto_update_statistics enabled, there is no reason whatsoever to recompile after the statistics update. However, I cannot find any information about how to tell when we should update the statistics. I would like to believe that we shouldn't have to at all, but I have to convince myself of that as well. Instead of just saying "well, statistics are created and updated manually, so we don't need to update them daily." I need some references. Google is only showing me how to do it and what it does... not how to determine when I should do it. Any help appreciated.

As a rule of thumbs, a 20% change of data volume or data distribution on a particular table would be a good threshold for updating performance statistics. All execution plans referencing such a table should be invalidated automatically (at least Oracle does it this way) at the time fresh statistics are gathered therefore a new execution would trigger a compilation.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Lowell SSC Guru Points: 323504 More actions · Answer 2

for me, it depends on the size of the table;

for huge tables, it takes a lot less than 20% of the rows to make the statistics stale enough that they need to be refreshed manually;

`

I have a variation of this running thoughout the day, and don't seem to have stat problems anymore on our busier tables.

I put this together with some arbitrary limits, based on tye size of tables , so tables with 10^8 rows have different rules than 10^3 rows

i'm open for suggestions and improvements, of course!

--The Analysis: my crappy assumptions:

--UPDATE STATISTICS dbo.GMPVDET

--tables under 1000 rows, I'll leave then at 20%

--tables with more rows than that, I'll use an arbitrary sliding scale formula.

--formula to be modified based on analysis

SELECT X.*,

ISNULL(CASE

WHEN X.[Total Rows]<=1000

THEN

CASE

WHEN [Percent Modified] >=20.0

THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN --20% Small Table Rule'

END

WHEN [Percent Modified] = 100.00

THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN --100% No real Stats Rule'

--WHEN X.[Rows Modified] > 1000

--THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN --1000 Rows Modified Rule'

ELSE

CASE

WHEN X.[Total Rows] > 1000000000 --billion rows

THEN CASE

WHEN [Percent Modified] > 0.1

THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 1B Big Table Rule'

END

WHEN X.[Total Rows] > 100000000 --hundred million rows

THEN CASE

WHEN [Percent Modified] > 1.0

THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 100M Big Table Rule'

END

WHEN X.[Total Rows] > 10000000 --ten million rows

THEN CASE

WHEN [Percent Modified] > 2.0

THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 10M Big Table Rule'

END

WHEN X.[Total Rows] > 1000000 --million rows

THEN CASE

WHEN [Percent Modified] > 5.0

THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 1M Big Table Rule'

END

WHEN X.[Total Rows] > 100000 --hundred thousand rows

THEN CASE

WHEN [Percent Modified] > 10.0

THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 100K Big Table Rule'

END

WHEN X.[Total Rows] > 10000 --ten thousand rows

THEN CASE

WHEN [Percent Modified] > 20.0

THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 10K Big Table Rule'

END

END,'') AS [Statistics SQL]

FROM (

SELECT DISTINCT

DB_NAME() AS [Database],

S.name AS [Schema Name],

T.name AS [Table Name],

I.rowmodctr AS [Rows Modified],

P.rows AS [Total Rows],

CASE

WHEN I.rowmodctr > P.rows

THEN 100

ELSE CONVERT(decimal(8,2),((I.rowmodctr * 1.0) / P.rows * 1.) * 100.0)

END AS [Percent Modified]

FROM

sys.partitions P

INNER JOIN sys.tables T ON P.object_Id = T.object_id

INNER JOIN sys.schemas S ON T.schema_id = S.schema_id

INNER JOIN sysindexes I ON P.object_id = I.id

WHERE P.index_id in (0,1)

AND I.rowmodctr > 0

) X

WHERE [Rows Modified] > 1000

ORDER BY [Rows Modified] DESC

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 3

I have found that you need to examine the 'noise' in each table, and update accordingly. Now, you have to go to 'huge' before it matters, but as little as 0.5% in a couple of fact tables I've dealt with have been enough to throw off the entire system.

Here's what I've found was the easiest (conceptually). Do your standard checking for volume differences. I'm more used to older systems but the stuff above looks tasty. Then I used to keep an override table. If the table wasn't in the overrides, we used to (on weekends) restat anything at +10%. I had one table that got restatted at 0.000001 or something equally rediculous, and that was nightly.

It all depends on your system and what you need, that's why it's so hard to get a # to nail down.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

rodbloke SSCommitted Points: 1611 More actions · Answer 4

Fantastic script above! Thanks. Also, if you're using SQL 2008 R2 SP1 upwards, check out trace flag 2371 to see if its of any use.

Cheers

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

SQLKnowItAll (4/8/2013)
It is pretty clear that with auto_update_statistics enabled, there is no reason whatsoever to recompile after the statistics update.

There's never any need to recompile after a stats update no matter what the auto update settings are. Updating stats invalidates all plans, so all queries will recompile afterwards anyway

However, I cannot find any information about how to tell when we should update the statistics. I would like to believe that we shouldn't have to at all, but I have to convince myself of that as well. Instead of just saying "well, statistics are created and updated manually, so we don't need to update them daily." I need some references.

The auto update (which updates after 20% of the table has changed) is not enough, unless you're working on a tiny DB with minimal changes. Your two options are pretty much:

- Analyse and see what tables are prone to stale stats. This will require a lot of familiarity with the queries, checking their performance, watching for degradation, putting in manual stats updates for those tables sufficient that the queries running do not suffer from degraded performance due to stale stats.

- Scheduled stats updates of all tables on a regular basis.

The second is far easier and if you have the available maintenance time there's little reason not to do that. It's the times where you don't have adequate maintenance windows where you have to put in huge amounts of work to identify the queries that need manual updates.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jared SSC Guru Points: 61793 More actions · Answer 6

GilaMonster (4/9/2013)

SQLKnowItAll (4/8/2013)
It is pretty clear that with auto_update_statistics enabled, there is no reason whatsoever to recompile after the statistics update.
There's never any need to recompile after a stats update no matter what the auto update settings are. Updating stats invalidates all plans, so all queries will recompile afterwards anyway
However, I cannot find any information about how to tell when we should update the statistics. I would like to believe that we shouldn't have to at all, but I have to convince myself of that as well. Instead of just saying "well, statistics are created and updated manually, so we don't need to update them daily." I need some references.
The auto update (which updates after 20% of the table has changed) is not enough, unless you're working on a tiny DB with minimal changes. Your two options are pretty much:
- Analyse and see what tables are prone to stale stats. This will require a lot of familiarity with the queries, checking their performance, watching for degradation, putting in manual stats updates for those tables sufficient that the queries running do not suffer from degraded performance due to stale stats.
- Scheduled stats updates of all tables on a regular basis.
The second is far easier and if you have the available maintenance time there's little reason not to do that. It's the times where you don't have adequate maintenance windows where you have to put in huge amounts of work to identify the queries that need manual updates.

Thanks for all of the replies! I think doing it regularly works for us. We currently do it daily, but I am trying to streamline it. The stored proc created years ago does the update on statistics and then runs sp_recompile, which I think is unnecessary and since it takes a schema lock has caused deadlocks on some other processes. I am tasked with analyzing old stuff and updating it. Sample of original below run in debug mode to show the dynamic script actually executed:

USE [DBA]

DECLARE @mytable_id INT

DECLARE @mytable VARCHAR(100)

DECLARE @owner VARCHAR(128)

DECLARE @SQL VARCHAR(256)

SELECT @mytable_id = MIN(object_id)

FROM sys.tables WITH(NOLOCK)

WHERE is_ms_shipped = 0

WHILE @mytable_id IS NOT NULL

BEGIN

SELECT @owner = SCHEMA_NAME(schema_id), @mytable = name

FROM sys.tables

WHERE object_id = @mytable_id

SELECT @SQL = 'UPDATE STATISTICS '+ QUOTENAME(@owner) +'.' + QUOTENAME(@mytable) +''

EXEC (@SQL)

SELECT @SQL = 'EXEC sp_recompile '''+ QUOTENAME(@owner) +'.' + QUOTENAME(@mytable)+''''

EXEC (@SQL)

SELECT @mytable_id = MIN(object_id)

FROM sys.tables WITH(NOLOCK)

WHERE object_id > @mytable_id

AND is_ms_shipped = 0

END

My thought was to remove the whole sp_recompile on this bad boy as I believe it is not needed since the update statistics will invalidate most of the plans anyway and force a recompile. Thoughts? (Also, we I want to stop running this on Monday when indexes are rebuilt on Sunday and little to 0 data has changed).

Jared
CE - Microsoft

Perry Whittle SSC Guru Points: 233867 More actions · Answer 7

i use a combination of the date last updated and the amount of data modified

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉