How can we enhance statistics information collection

Question

How can we enhance statistics information collection

Johan Bijnens

SSC Guru

Points: 135662
More actions
March 18, 2024 at 10:44 am

#4397388
Currently we are collecting database statistics information using
```
select * FROM sys.dm_db_index_physical_stats( 14, NULL, NULL, NULL, 'SAMPLED')
```
For some larger databases this statement takes more than 23 minutes to complete.
Is there anything we can do to speed up this process and collect the same data?

Johan

Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data and code to get the best help
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Who am I ? Sometimes this is me but most of the time this is me

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Mr. Brian Gale One Orange Chip Points: 25014 More actions · Answer 1

Not essentially a "speed it up" thing, but could you refresh a test instance and run that on the test instance? I am wondering if MAYBE the slowness is due to the system being in use or possibly statistics being updated while you are running it so you are getting blocked?

I mean, I'm just guessing here, but one of the first things I do when tuning ANY query is to replicate the problem on a test system (if possible) and then review execution plan and statistics IO/Time, but in your case, I don't think the execution plan will help. Statistics IO and Time MAY help to determine if the bottleneck is disk or CPU, but tuning that query is not something you can really do apart from changing parameters...

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Johan Bijnens SSC Guru Points: 135662 More actions · Answer 2

We collect these statistics on a weekly basis.

This way we persist runtime history for all monitored instances.

This comes in handy when provisioning for growth or planning migrations.

Devs and application teams have no clue about their data systems space consumption.

They have a rough idea about tables being important and in the best case "this table can have many rows", but that's about it.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data and code to get the best help

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Who am I ? Sometimes this is me but most of the time this is me

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 3

Johan Bijnens wrote:

Currently we are collecting database statistics information using
select * FROM sys.dm_db_index_physical_stats( 14, NULL, NULL, NULL, 'SAMPLED')
For some larger databases this statement takes more than 23 minutes to complete.
Is there anything we can do to speed up this process and collect the same data?

The only time I've seen people think they need this to be absolutely up to date is when they're doing index maintenance. On large databases, the status of things just doesn't change that often. I have it setup where that command runs on Saturday evenings for all of the databases and stores the output in a table in the database with the name of "dbo.IndexDNAInfo". I also have enhanced things like including index names, output from "usage" and a bunch more for a kind of "one stop shop" when it comes to rowstore indexes, heaps, tables, file groups, and files in general including sizes in MB, etc, etc.

Of course, I strongly recommend NOT doing all the crazy supposed "Best Practice" stuff for index maintenance anyway but getting the data even a few days ahead of time isn't going to make any major changes. It's better to spend time doing statistics updates and DBCC CHECKDB, etc.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Johan Bijnens SSC Guru Points: 135662 More actions · Answer 4

Jeff Moden wrote:

Johan Bijnens wrote:
Currently we are collecting database statistics information using
select * FROM sys.dm_db_index_physical_stats( 14, NULL, NULL, NULL, 'SAMPLED')
For some larger databases this statement takes more than 23 minutes to complete.
Is there anything we can do to speed up this process and collect the same data?
The only time I've seen people think they need this to be absolutely up to date is when they're doing index maintenance. On large databases, the status of things just doesn't change that often. I have it setup where that command runs on Saturday evenings for all of the databases and stores the output in a table in the database with the name of "dbo.IndexDNAInfo". I also have enhanced things like including index names, output from "usage" and a bunch more for a kind of "one stop shop" when it comes to rowstore indexes, heaps, tables, file groups, and files in general including sizes in MB, etc, etc.
Of course, I strongly recommend NOT doing all the crazy supposed "Best Practice" stuff for index maintenance anyway but getting the data even a few days ahead of time isn't going to make any major changes. It's better to spend time doing statistics updates and DBCC CHECKDB, etc.

100% with you, Jeff

These collected statistics are not used for maintenance purposes, but to support application teams and ofcourse DBA that want to know data systems need for space etc over time.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data and code to get the best help

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Who am I ? Sometimes this is me but most of the time this is me

Mr. Brian Gale One Orange Chip Points: 25014 More actions · Answer 5

Johan Bijnens wrote:

We collect these statistics on a weekly basis. This way we persist runtime history for all monitored instances.
This comes in handy when provisioning for growth or planning migrations.
Devs and application teams have no clue about their data systems space consumption. They have a rough idea about tables being important and in the best case "this table can have many rows", but that's about it.

Since you are collecting these on a weekly basis, any chance you can automate it and schedule it to run after business hours (weekend perhaps)? You may run a 24/7 shop, in which case this may not be possible, but if you have some downtime, running a 23 minute process during downtime and automated isn't likely to impact much, right? And even if you do run a 24/7 shop, scheduling this out so it runs automatically, even if it takes 23 minutes to complete, you then have the data in the database and can consume or analyze it as needed. Mind you, you may already be doing this.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Johan Bijnens SSC Guru Points: 135662 More actions · Answer 6

It is indeed scheduled in the weekend (using SQLAgent and Powershell).

Still 23 minutes is a long time to collect these statistics for this single database.

It's a good thing not all our database are that large (few table, many rows, bit to much indexes)

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data and code to get the best help

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Who am I ? Sometimes this is me but most of the time this is me

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

Johan Bijnens wrote:

Jeff Moden wrote:
Johan Bijnens wrote:
Currently we are collecting database statistics information using
select * FROM sys.dm_db_index_physical_stats( 14, NULL, NULL, NULL, 'SAMPLED')
For some larger databases this statement takes more than 23 minutes to complete.
Is there anything we can do to speed up this process and collect the same data?
The only time I've seen people think they need this to be absolutely up to date is when they're doing index maintenance. On large databases, the status of things just doesn't change that often. I have it setup where that command runs on Saturday evenings for all of the databases and stores the output in a table in the database with the name of "dbo.IndexDNAInfo". I also have enhanced things like including index names, output from "usage" and a bunch more for a kind of "one stop shop" when it comes to rowstore indexes, heaps, tables, file groups, and files in general including sizes in MB, etc, etc.
Of course, I strongly recommend NOT doing all the crazy supposed "Best Practice" stuff for index maintenance anyway but getting the data even a few days ahead of time isn't going to make any major changes. It's better to spend time doing statistics updates and DBCC CHECKDB, etc.
100% with you, Jeff These collected statistics are not used for maintenance purposes, but to support application teams and ofcourse DBA that want to know data systems need for space etc over time.

I don't know of an easy way to make it go faster, Johan. You're already doing the "SAMPLED" thing. You could do it by object_id in parallel a couple of times. You could do one or two of the larger tables on one or more separate threads and all the rest on a separate thread. If you have it "balanced" correctly, you might be able to get them to be done at the same time.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 8

Johan Bijnens wrote:

We collect these statistics on a weekly basis. This way we persist runtime history for all monitored instances.
This comes in handy when provisioning for growth or planning migrations.
Devs and application teams have no clue about their data systems space consumption. They have a rough idea about tables being important and in the best case "this table can have many rows", but that's about it.

It's likely that tables with a large number of rows are causing the extended runtime of the statistics job. Fortunately, gathering statistics is relatively unobtrusive and can be executed concurrently with other database operations.