Update Statistics..

Question

Update Statistics..

Arsh

SSCertifiable

Points: 6345
More actions
August 31, 2017 at 2:00 am

#328229

Hi Experts,
Trying to figure out if it might help in solving any performance issues by introducing an additional manual update job. To put it other way , how do I establish that the stats for a particular table have become stale ? We have auto update stats enabled with a manual update stats job every 24 hrs but I've noticed some tables don't have the stats updated in that 24 hours of time , neither by the auto update nor by the manual job. Then the same tables have stats updated after , say 36 hours , either by the auto-update or manual update , whichever comes first (I understand its because the 20% data change threshold is reached already and the optimizer is asked to prepare a plan for the related query). Can anybody help in determining that deploying one more manual job can help to keep the stats updated (or less stale) , or it can have an impact ? My intention is to make 'that 36 hrs' above shorter to say 5 Hrs. The biggest table has 140 million records and 1.3 millions records added up every day. I feel that this is an issue as there are so many records without stats update though the 20% hasn't reached. Any comments / suggestions highly appreciated. Thank you.

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

Look at the slow query. Is there a large difference between estimated and actual row count at the leaf level
Update the stats. Is the query plan better (better estimates) and faster?

If yes to both, probably a stale stats problem, update that table's stats more often.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Arsh SSCertifiable Points: 6345 More actions · Answer 2

GilaMonster - Thursday, August 31, 2017 2:15 AM
Look at the slow query. Is there a large difference between estimated and actual row count at the leaf level
Update the stats. Is the query plan better (better estimates) and faster?
If yes to both, probably a stale stats problem, update that table's stats more often.

Thanks Gail. Wish I could . The query plans that I get from the blocking sessions has no .Actual Rows'. This is SQL server 2008 R2 Std Edition. The other way I looked at is manually calculating from the histogram , which shows there is difference of 100% on average in a step. Is that enough suggest that the stats are stale ?

Thanks

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

Arsh - Thursday, August 31, 2017 4:20 AM
The query plans that I get from the blocking sessions has no .Actual Rows'.

No, it won't, because those are pulled from cache and hence have only estimated plans.
If you want actual plans, you need to run the queries from SSMS or use Extended Events to capture them (which I don't recommend, because the event has a rather large overhead)

The other way I looked at is manually calculating from the histogram , which shows there is difference of 100% on average in a step. Is that enough suggest that the stats are stale ?

Err, no. That suggests nothing of the sort.
Stale stats are stats where the data in the table has changed such that the stats no longer are an accurate representation of the data. Hence you cannot look at the stats alone to determine if they're stale.

The histogram often has large differences between steps as the generation method looks for interesting values in the data. That's normal behaviour.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Arsh SSCertifiable Points: 6345 More actions · Answer 4

GilaMonster - Thursday, August 31, 2017 4:30 AM
Arsh - Thursday, August 31, 2017 4:20 AM
The query plans that I get from the blocking sessions has no .Actual Rows'.
No, it won't, because those are pulled from cache and hence have only estimated plans.
If you want actual plans, you need to run the queries from SSMS or use Extended Events to capture them (which I don't recommend, because the event has a rather large overhead)
The other way I looked at is manually calculating from the histogram , which shows there is difference of 100% on average in a step. Is that enough suggest that the stats are stale ?
Err, no. That suggests nothing of the sort.
Stale stats are stats where the data in the table has changed such that the stats no longer are an accurate representation of the data. Hence you cannot look at the stats alone to determine if they're stale.
The histogram often has large differences between steps as the generation method looks for interesting values in the data. That's normal behaviour.

Oh , I c. I'll go by the comparison between the actual and estimated readings from the plan from SSMS accordingly. Thank you for the help.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 5

Arsh - Thursday, August 31, 2017 2:00 AM
Hi Experts,
Trying to figure out if it might help in solving any performance issues by introducing an additional manual update job. To put it other way , how do I establish that the stats for a particular table have become stale ? We have auto update stats enabled with a manual update stats job every 24 hrs but I've noticed some tables don't have the stats updated in that 24 hours of time , neither by the auto update nor by the manual job. Then the same tables have stats updated after , say 36 hours , either by the auto-update or manual update , whichever comes first (I understand its because the 20% data change threshold is reached already and the optimizer is asked to prepare a plan for the related query). Can anybody help in determining that deploying one more manual job can help to keep the stats updated (or less stale) , or it can have an impact ? My intention is to make 'that 36 hrs' above shorter to say 5 Hrs. The biggest table has 140 million records and 1.3 millions records added up every day. I feel that this is an issue as there are so many records without stats update though the 20% hasn't reached. Any comments / suggestions highly appreciated. Thank you.

What is the purpose of that large table? Is it an "audit table" or something like an "invoice detail" table by any chance? In other words, are the older rows entirely static? Are the newest rows updated frequently or just inserted with little or no other modifications?

And, speaking of blocking and contrary to all that we've all been taught, blocking CAN actually be the result of index rebuilds with the wrong FILL FACTOR. I found that that out the hard way and simply stopped rebuilding indexes and the blocking (which was being caused by wanton page splits after the defrags. I don't know if I've said it on this thread before but I've not done any index maintenance on my production server since 17 Jan 2016 (more than a year and a half ago). Performance got markedly better in the first 3 months and hasn't suffered because of it since. Yes, I do rebuild stats but good code is the key above all else. You say you know which code is doing the blocking.

You can quickly prove that stats either are or are not the problem. Rebuild them all on the table and then see if the blocking still occurs after waiting for the code to recompile and recache. I'm thinking that the blocking will still occur.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Arsh SSCertifiable Points: 6345 More actions · Answer 6

Jeff Moden - Thursday, August 31, 2017 7:43 AM
Arsh - Thursday, August 31, 2017 2:00 AM
Hi Experts,
Trying to figure out if it might help in solving any performance issues by introducing an additional manual update job. To put it other way , how do I establish that the stats for a particular table have become stale ? We have auto update stats enabled with a manual update stats job every 24 hrs but I've noticed some tables don't have the stats updated in that 24 hours of time , neither by the auto update nor by the manual job. Then the same tables have stats updated after , say 36 hours , either by the auto-update or manual update , whichever comes first (I understand its because the 20% data change threshold is reached already and the optimizer is asked to prepare a plan for the related query). Can anybody help in determining that deploying one more manual job can help to keep the stats updated (or less stale) , or it can have an impact ? My intention is to make 'that 36 hrs' above shorter to say 5 Hrs. The biggest table has 140 million records and 1.3 millions records added up every day. I feel that this is an issue as there are so many records without stats update though the 20% hasn't reached. Any comments / suggestions highly appreciated. Thank you.
What is the purpose of that large table? Is it an "audit table" or something like an "invoice detail" table by any chance? In other words, are the older rows entirely static? Are the newest rows updated frequently or just inserted with little or no other modifications?
And, speaking of blocking and contrary to all that we've all been taught, blocking CAN actually be the result of index rebuilds with the wrong FILL FACTOR. I found that that out the hard way and simply stopped rebuilding indexes and the blocking (which was being caused by wanton page splits after the defrags. I don't know if I've said it on this thread before but I've not done any index maintenance on my production server since 17 Jan 2016 (more than a year and a half ago). Performance got markedly better in the first 3 months and hasn't suffered because of it since. Yes, I do rebuild stats but good code is the key above all else. You say you know which code is doing the blocking.
You can quickly prove that stats either are or are not the problem. Rebuild them all on the table and then see if the blocking still occurs after waiting for the code to recompile and recache. I'm thinking that the blocking will still occur.

Thanks Jeff. Its an invoice detail like table, with about 70% insert operations. I am starting to dig deep into it . I've an auto update agent job (sp_updatestats) to run once every 24 hours but it leaves of 30% of tables (the list changes depending on which tables auto-update has completed between the two manual updates , supposedly?) or because the threshold of 20% hasn't reached ? Will using the 'UPDATE STATISTICS' with table name force SQL server to update the stats ? Thank you.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 7

For the experiment, I'd use UPDATE STATISTICS schemaname.tablename WITH FULLSCAN; during a relatively quiet time. The reason for the WITH FULLSCAN is because auto-updates of statistics can be done with as little as a 1-2% sample rate.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Arsh SSCertifiable Points: 6345 More actions · Answer 8

Thanks Jeff. Isn't that the auto-update stats uses 20% as threshold?

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 9

Threshold for triggering auto update != sampling rate.
The threshold is 20%. Jeff's talking about the sample rate.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Arsh SSCertifiable Points: 6345 More actions · Answer 10

GilaMonster - Thursday, August 31, 2017 1:19 PM
Threshold for triggering auto update != sampling rate.
The threshold is 20%. Jeff's talking about the sample rate.

Thanks Gail. So the higher the sampling rate the better are the stats, I suppose ?

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 11

Arsh - Friday, September 1, 2017 5:40 AM
GilaMonster - Thursday, August 31, 2017 1:19 PM
Threshold for triggering auto update != sampling rate.
The threshold is 20%. Jeff's talking about the sample rate.
Thanks Gail. So the higher the sampling rate the better are the stats, I suppose ?

Like everything else, "It Depends". In the case of statistics, it depends on the distribution of data.

As a reminder, the purpose of the "FULL SCAN" experiment I suggested is to prove whether or not statistics on the large table are a part of the blocking problem or not. My hunch is that it's not a part of the problem, but I could be wrong. "One good experiment is worth a thousand expert opinions". 😉

Have you done the experiment, yet?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 12

Arsh - Friday, September 1, 2017 5:40 AM
GilaMonster - Thursday, August 31, 2017 1:19 PM
Threshold for triggering auto update != sampling rate.
The threshold is 20%. Jeff's talking about the sample rate.
Thanks Gail. So the higher the sampling rate the better are the stats, I suppose ?

Not necessarily. If the data is evenly distributed, then even a 1% sample is probably sufficient.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Arsh SSCertifiable Points: 6345 More actions · Answer 13

Jeff Moden - Friday, September 1, 2017 6:55 AM
Arsh - Friday, September 1, 2017 5:40 AM
GilaMonster - Thursday, August 31, 2017 1:19 PM
Threshold for triggering auto update != sampling rate.
The threshold is 20%. Jeff's talking about the sample rate.
Thanks Gail. So the higher the sampling rate the better are the stats, I suppose ?
Like everything else, "It Depends". In the case of statistics, it depends on the distribution of data.
As a reminder, the purpose of the "FULL SCAN" experiment I suggested is to prove whether or not statistics on the large table are a part of the blocking problem or not. My hunch is that it's not a part of the problem, but I could be wrong. "One good experiment is worth a thousand expert opinions". 😉
Have you done the experiment, yet?

Hi Jeff . Thanks. Actually there's no end-to-end test environment here in this place (Anti ITIL environment 🙂 ) . Could manage setting up one test DB server though with a production restore just over a month ago for another exercise. Nonetheless, I'll run an update stats with full scan on this table and see if I can simulate a part of it running some query at the database directly. Doing this on production looks very difficult as the technical project lead has no knowledge of the databases and feels good opinionating on advance topics on databases...and a DB dev lead feels that statistics are actually not important ....sic. Hence I put it as a Take-it-or-Leave it statement after some words about statistics . So it might take time till the same people ask me to help in the statistics area . Thank you for the valuable help . really appreciate.

Arsh SSCertifiable Points: 6345 More actions · Answer 14

GilaMonster - Friday, September 1, 2017 1:52 PM
Arsh - Friday, September 1, 2017 5:40 AM
GilaMonster - Thursday, August 31, 2017 1:19 PM
Threshold for triggering auto update != sampling rate.
The threshold is 20%. Jeff's talking about the sample rate.
Thanks Gail. So the higher the sampling rate the better are the stats, I suppose ?
Not necessarily. If the data is evenly distributed, then even a 1% sample is probably sufficient.

Hi Gail, the sampled rows are only 0.35% of a total of close to 140 million rows. The density-vector shows 'All Density' as 0.0002 . Average Length is 8 . I'm looking at a column that has some transaction amount. Thank you.