Statistics on Ascending Columns

It comes as rather a shock to find out that one of the commonest circumstances in an OLTP database, an ascending primary key with most querying on the latest records, can throw the judgement of the Query Optimiser to the extent that perfomance nose-dives. Fabiano once again puts on snorkel and goggles to explore the murky depths of execution plans to find out why.

First of all, a warning:

Much of what you will see here is ‘undocumented’. That means we can discuss it, but you can’t trust that this will not change in a service pack or a new product release. Also you will not be able to call Microsoft support for help or advice on the topic, or if you get an error when using the following techniques. Use with precaution and with extra attention.

Introduction

A very common problem relating to distribution statistics is associated with what we call “ascending value columns” in a table. This generally happens when a large table has ascending values and the most recent rows are the ones most commonly being accessed. When data values in a column ascend, most new insertions are beyond the range covered by the distribution statistics. This can lead to poorly performing plans since they inaccurately predict from the statistics that filters selecting recent data would exclude the entire relation.

As we already know, a column which belongs to a statistic has to get to a threshold quantity of modifications after the previous update in order to trigger the auto-updated statistics, and for certain cases, this threshold is too high. In other words, a column table requires too many modifications before the distribution statistics are rebuilt.

This delay in auto-updating a statistic can be a problem for queries that are querying the newest data in the table. That is very likely to happen when we are using date or identity columns. As you know, the use of an outdated statistic can be very bad for performance because SQL Server will not be able to estimate precisely the number of rows a table will return, and is likely to use a poor execution plan.

To simulate this problem I’ll start by creating a table called Pedidos (which means Orders in Portuguese) and I’ll make the identity column a primary key.

Here is what the data looks like:

1348-image001.png

As you can see the orders are added sequentially and that’s what usually happens with this sort of table. The date_order (column data_pedido) is the date of the order.

Now let’s suppose a query that is searching for the orders of the day will look like the following:

1348-image002.png

1348-image003.png

In the beginning of the day SQL Server can estimate that 0 (even it shows 1 on the plan, it is estimating 0) rows will be returned. But what will happen in the end of the day when thousands of orders were inserted?

It’s likely that an execution plan using the Key Lookup will not be a good option since the lookup operation will demand a lot of pages to be read.

Let’s simulate the problem by inserting 5001 new orders in the table. Because this is below the autoupdate threshold it will not trigger the auto-updated statistics.

Now the table has lots of new orders, and each order was inserted in ascending order. That means the column Data_Pedido is increasing.

Let’s run the same query and check how many pages SQL Server has to read to execute this plan.

1348-image004.png

1348-image005.png

Note: Notice that I’m using the hint RECOMPILE to avoid query plan reuse. Also the auto create and auto update statistics are enabled on the tempdb database.

We can see here that the estimation of how many rows will be returned is very wrong. The 5001 insertions were not enough to trigger the auto-updated statistics, and the query optimizer still thinks that the table has no orders greater than GetDate().

What if we force a scan on the base table?

1348-image006.png

1348-image015.png

Note: The hint FORCESCAN is new on SQL Server 2008 R2 SP1. You can read more this hint http://technet.microsoft.com/en-us/library/ms187373.aspx.

As we can see, a scan on the clustered index requires much fewer page-reads, 200 pages on clustered index versus 10017 on the non-clustered one, plus the lookup on the clustered index.

The estimate is still wrong because the statistics are outdated and therefore don’t represent the reality of what is in the table.

Branding

SQL Server can detect when the leading column of a statistics object is ascending and can mark or “brand” it as ascending. A statistics object that belongs to an ascending column is branded as “ascending” after three updates on the statistics. It’s necessary to update it with ascending column values so that when the third update occurs, SQL Server brands the statistics object as ascending.

It’s possible to check the statistics’ brand using the trace flag 2388, when turned on it changes the result of the DBCC SHOW_STATISTICS, and then we can see a column called “Leading column type” with the brand of the column.

For instance:

1348-image008.png

As we can see, the column now is “Unknown”. Let’s insert 10 rows with ascending orders and update the statistics to see what will happen.

1348-image010.png

As I said before, the statistics have to be updated three times to be branded as ascending, so let’s do it.

1348-image012.png

Trace Flags 2389 and 2390

By default, the query optimizer keeps the information about the branding of statistics, but doesn’t make use of it. The optimizer won’t choose a different plan based on whether the column is ascending or not. To change this, you need to use the trace flags 2389 or 2390.

When the trace flag 2389 is enabled, the statistics are ascending and you have a covered index on the ascending leading key of the statistic, then the query optimizer will query the table to compute the highest value of the column. This value is then used in the estimation of how many rows will be returned for the predicate.

The trace flag 2390 works similarly to flag 2389, the main difference being that, with this flag set, the query optimizer doesn’t care if the column was branded as ascending or not. In other words, even if the column is marked as “Unknown” it will query the table to find the highest value.

To see this in practice we’ll use an undocumented query hint called QUERYTRACEON. With this query hint we can enable a trace flag in a statement scope.

Here we have a command that is using the trace flags 2389 and 2390:

1348-image014.png

1348-image015.png

SQL Server, we can see, now has a new estimation and it was enough to avoid the bad plan that was using the key lookup.

Note: Internally QUERYTRACEON will run DBCC TRACEON command and you’ll need a sysadmin privilege to run this hint. A good alternative is to use a stored procedure and run it as an administrator user.

Note: There is another brand for a statistics object called “Stationary”. The query optimizer will not trigger the query on the table to compute the value if the brand is stationary. In other words, the data in the leading column(s) is not ascending.

Conclusion

Remember that this is just an alternative way to fix a problem that is seen with large tables with ascending columns. The best option is to update your statistics periodically. In this case, it’s just a matter of creating a job to update the statistics more frequently for the tables with ascending columns up to several times a day, there is no fixed number of updates per day here, you’ll need to figure out which is the appropriate number of updates necessary for your scenario. You also have query hints, plan guides and other alternatives to fix this problem.

Here are some questions I asked myself when I first read about these trace flags.

Are trace flags always good?

Probably not, you have to find the scenario where to test them, and test, test, test.

Will SQL Server always query the table to find the highest value on in the column?

Not always. It depends on the predicate and how you are querying the table.

Can I use the trace flag 2389 alone?

Yes, but it will work only for columns branded as ascending.

Can I use the trace flag 2390 alone?

You can, but it doesn’t make sense to do so because it will stop working when the column turns out to be ascending.

As with everything, you need to test it before you use it and it always depends on the scenario. It’s also worth mentioning that this method is not supported; so don’t call Microsoft if you have any doubts about this, or me either, come to think of it.

That’s all folks, see you.