Blog Post

Setting the Sample Rate for Auto Stats updates

,

SQL Server functionality can move on pretty quickly sometimes, and it’s not always all about the big features but the many little enhancements the SQL team implements to make the product better and easier to use.

I’d barely published my post about Automatic Sample Sizes for Statistics Updates when I saw that Microsoft had enhanced the functionality in the latest cumulative update for SQL Server.

From SQL 2016 CU4 you can manually update a statistics object specifying a given sample rate, and you can specify that the sample rate you specified is what will be used for any subsequent automatic updates on the same object.

You can’t change the default sample rate globally, but this functionality is actually more useful. It means that if you find that you have a particular statistics object where the automatic sampling rate is too low, and that has a negative effect on query performance, then you can choose a sampling rate specific for that object.

Let’s take a quick look at doing this in practice.

Here’s some reused SQL from the previous post to create a table and throw in a million rows:

--Create table for Testing
CREATE TABLE dbo.Test(
   Id INT IDENTITY(1,1) CONSTRAINT PK_Test PRIMARY KEY CLUSTERED, 
   TextValue VARCHAR(20) NULL
);
--Create index on TextValue
CREATE INDEX IX_Test_TextValue ON dbo.Test(TextValue);
--Insert a bunch of rows
INSERT INTO dbo.Test(TextValue) 
SELECT TOP 1000000 'blah'
FROM sys.objects a, sys.objects b, sys.objects c, sys.objects d; 

I haven’t updated the stats yet. As we can see if I view them:

--View the statistics
DBCC SHOW_STATISTICS('dbo.Test', IX_Test_TextValue) WITH STAT_HEADER;

SetStatsSampleRate1

The stats haven’t any information yet but there is still one thing to notice. See that final column Persisted Sample Percent? That’s new.

The stats will only get updated by the auto stats mechanism when I run a query against the table and the optimiser is interested in them but sees they’re stale. From zero to a million rows should be stale enough. Let’s run a quick query:

--Random Query
SELECT * 
FROM dbo.Test
WHERE TextValue = 'not blah’;'

And view the stats again:

SetStatsSampleRate2

We can see the stats got updated and the table was big enough that SQL decided to sample rather than scan the whole index. About 42% of the rows got sampled. Persisted Sample Percent is set to zero, i.e. it is not set.

Let’s say that I want to make sure that this statistics object always updates with a full scan.

What I can now do is manually update the statistics and specific that the sample percentage should be persisted:

--Update the stats and persist the sample rate
UPDATE STATISTICS dbo.Test IX_Test_TextValue 
WITH FULLSCAN, PERSIST_SAMPLE_PERCENT = ON;

Let’s check that did what we wanted by viewing the stats:

SetStatsSampleRate3

Great, we can see all the rows were sampled and the Persisted Sample Percent is now set to 100.

Now let’s see what happens when another auto update kicks in. I insert another million rows with the same SQL from above. Then I run the select query again. The optimiser notices the statistics are now stale and so it updates them and recompiles the plan for the query. We can see this when we check the stats yet again:

SetStatsSampleRate4

SQL has done exactly what we wanted, it performed the auto stats update in the background and it used the 100% persisted sample rate.

Thanks Microsoft!

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating