SQLServerCentral Article

Custom Statistics Distributions in SQL Server 2022

,

Statistics are important for the query processing engine in SQL Server to run efficiently. The type of execution plan that is created often depends on the optimizer knowing the distribution of data in a column. This is why so many of us have maintenance jobs that update statistics, even with auto-update statistics turned on.

Until now, we've been limited to a single histogram for all columns, with the same shape and size, or at least, the same maximum 200 steps.  We've also been limited to a single column when there are multiple key columns. This post will show you how this changes in SQL Server 2022.

We can now customize our histograms and statistics. Using the hidden trace flag, 2024, we can now add additional parameters to the CREATE STATISTICS command. This will allow us to specify the number of steps, to help the optimizer build better plans. If you have less than 200 unique values, or don't want to consider some, you can even create a custom histogram that forces the optimized to reconsider different indexes in building its execution plan.

Enabling Custom Statistics

Like any other trace flag, we enable this with DBCC TRACEON, as follows:

dbcc traceon (2024);

We can disable this with the following code. However, if you do this, then all the custom statistics you've built will be rebuilt in the normal way, with up to 200 steps.

dbcc traceoff (2024);

You can also add this flag to the startup parameters for your instance, by adding this to the parameter list in Configuration Manager.

-T2024

Once you've enabled the trace flag, then follow the instructions below to customize statistics.

Create Custom Statistics

There isn't new DDL. Instead, CREATE STATISTICS has a new WITH option called CUSTOM. This is not documented yet, but with some experimentation, I found that there are some new keywords and structures that don't cause syntax errors.

They can also improve performance dramatically, or hurt it. As with anything, please be careful and test your code and changes.

New Parameters

The structure of the new CUSTOM option is shown here. I'm not great at the syntax diagrams, but the [] are optional items. Essentially you can choose the number of steps in your histogram, up to 2028 or you can populate the histogram with a JSON document.

CREATE STATISTICS statsname
ON {table} (column)
WITH CUSTOM 
 ( [STEPS = [0..2048]]
   [{ data_value : count, ...n  }]
 )

If you choose the steps, then SQL Server will create up to that many steps in the histogram, providing you have that many unique values.

If you want to create your own histogram, then you enter the various values and counts as a JSON document. For example, if I want to skew my data on genders to being M, I can do this:

CREATE STATISTICS customer_gender_stats
  ON dbo.Customer (gender)
WITH CUSTOM (
  { 'M':5000, 'F': 0, 'N': 0}
)

This might force the optimize to ignore indexes that depend on gender first and instance choose something else.

This is something that can be time consuming, and certainly you might want to revisit this, perhaps altering your custom histogram as your data changes, but if you find you need certain indexes, like in a COTS program, and can't change them, maybe you can influence the optimizer behavior with this technique.

Performance Analysis

One of the biggest challenges with adding in custom statistics is in measuring the impact. In order to properly evaluate how this works, you likely need a series of loops that will try different combinations of things and then run your query. As an example, you might write code like this to test a number of possibilities.

-- while loop
set statistics io on
set statistics time on
declare @i int = 1
while @i < 500
 begin
    create statistics newstatstest on dbo.customer (city) with custom (steps=@i)
    -- run query
    exec complexCustomerproc('Denver')
    drop statistics newstatstest
    select @i = @i + 1
 end

With this type of structure, you can gather results from a number of executions and compare them. If you have Query Store enabled, then this would be easy to compare if this actually worked. There is no custom option to add to CREATE STATISTICS as today is Apr 1. It's an April Fools joke that I wrote because I suspect a lot of people might wish they could customize statistics.

Summary

It's not often that April 1 falls on a Monday, Wednesday, or Friday, so with our change to three newsletters a week, I don't always get to write jokes. This year I saw the calendar and started jotting some ideas down. This one has been on my mind for awhile, and I thought it might make a fun joke.

Happy April Fools and hope you enjoyed this.

 

Rate

3 (2)

You rated this post out of 5. Change rating

Share

Share

Rate

3 (2)

You rated this post out of 5. Change rating