Blog Post

When were the statistics last updated and how many rows were sampled?

,

TL;DR; Final query is at the bottom.

Every now and again (particularly when someone is having performance problems) I’ll get the question “When were the statistics last updated?” On top of that, I recently found out that our UPDATE STATISTICS job uses the RESAMPLE option that tells the command to update the statistics based on it’s most recent sample rate. Which of course led me to wonder What was the previous sample rate?

In the end, I did a little research and found this post on the PERSIST_SAMPLE_PERCENT option with a great query that is almost exactly what I needed. So I’m going to use it as a starting point. I recommend reading through the post as it’s quite good and has some great information about the option PERSIST_SAMPLE_PERCENT. Here’s the original query:

SELECT ss.stats_id, ss.name, filter_definition, last_updated, rows, 
    rows_sampled, steps, unfiltered_rows, modification_counter, persisted_sample_percent,
    (rows_sampled * 100)/rows AS sample_percent
FROM sys.stats ss
INNER JOIN sys.stats_columns sc 
    ON ss.stats_id = sc.stats_id AND ss.object_id = sc.object_id
INNER JOIN sys.all_columns ac 
    ON ac.column_id = sc.column_id AND ac.object_id = sc.object_id
CROSS APPLY sys.dm_db_stats_properties(ss.object_id, ss.stats_id) shr
WHERE ss.[object_id] = OBJECT_ID('[Sales].[SalesOrderHeaderBulk]') 
    AND ac.name = 'OrderDate';

The first change I want to make is to get rid of the existing WHERE clause. I want to run this on all statistics, not just one. This also means I can get rid of the reference to sys.all_columns and sys.stats_columns. Next, I’m going to make all of the column references two-part names to be sure I’m not missing a reference to one of the tables I removed and because I think it’s a good best practice. Then, since I’m not pulling for a single statistic anymore I’d best pull the table name into the query (by referencing sys.objects), and since I don’t care about system objects I’m going to restrict this to user tables only (using is_ms_shipped). Last but not least I’m going to clean up the column order a little bit (given that my point here is when was it last updated and how many rows have been/will be sampled) and put an order by.

SELECT o.name AS object_name, ss.stats_id, ss.name as stat_name, 
ss.filter_definition, shr.last_updated, 
shr.persisted_sample_percent, 
(shr.rows_sampled * 100)/shr.rows AS sample_percent,
shr.rows, shr.rows_sampled, 
    shr.steps, shr.unfiltered_rows, shr.modification_counter 
FROM sys.stats ss
INNER JOIN sys.objects o 
    ON o.object_id = ss.object_id
CROSS APPLY sys.dm_db_stats_properties(ss.object_id, ss.stats_id) shr
WHERE o.is_ms_shipped = 0
ORDER BY o.name, ss.stats_id;

You can get the definitions for the columns from last_updated on from the BOL entry for sys.dm_db_stats_properties and I’ve added this query to my github repository.

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating