SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

NULL Values Impact the Performance of MAX() and MIN()

We know that we need to take special attention about NULL values while writing queries because they may lead us into writing incorrect query logics. NULLs may also affect the performance of aggregation function MAX() and MIN(). I have submitted this issue to Microsoft Connect. There are number of ways to get around it. I think the improvement can also be done at query engine level. Hope this can be fixed in next or future version of SQL Server. But for now, we need to know it and know how to get around it.

First, let’s produce the issue.

use tempdb
if object_id('test') is not null
	drop table test
create table test(id int, data char(7000) not null default(replicate('a', 7000)))
create clustered index id on test(id)
insert into test(id) values(null)
go 100
insert into test(id) values(1)
select * from test
-- return 101 rows

Now I create test table in a none unique index with 101 rows. Each row will take a data page in SQL Server. Now let’s execute

set statistics io on
select MAX(id) from test
--(1 row(s) affected)
--Table 'test'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Results from the query indicate that 2 logical reads take place for the execution. Query Engine performs 2 logical reads. That’s great. There is no performance issue at all. Now let’s do the same for MIN

select MIN(id) from test
--(1 row(s) affected)
--Table 'test'. Scan count 1, logical reads 103, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Notice the last result from previous query, it takes 103 logical reads! This is very significant – My table gets fully scanned. Think about this, if my table has billions of rows, scanning such table isn’t a fun at all. Full scan is what we should eliminate in our query in this case at least! Comparing the query plan of those two, they are exactly the same – index scan + stream aggregate. But why the performance is so different?

This happens when there are massive amount of NULLs in the index. Let’s assume our index is in ascending order. While calculating MAX value, SQL Server gets the root page of the index, navigates the B-Tree, finds the maximum value at the right side, and returns the first value from that end (the last value of the index). So it only needs to access 2 pages, root and leaf, in this case, to get the expected value. Although the query plan says table scan, SQL Server will actually not scan entire table since there is no need for that.

While calculating MIN value, SQL Server use the same algorithm, access the root page, navigates the B-Tree, and finds the minimum values at the left side of the tree. Before returning the minimum value, SQL Server has to verify the value to be return is not NULL. To do so, SQL Server scans the pages at leaf level through doubly linked chain. In this case, the first not null values is at the end of the link, so SQL Server scans entire index to reach the first not null value – entire table is scanned.

Obviously, the query processor can’t use NULL as an implicit predicate to utilize the B-Tree efficiently. To get around this issue, there are 2 kind of approaches.

  1. Change existing algorithm
  2. Use existing algorithm but add additional index

Change existing algorithm
This is sample, just add an WHERE clause to force the query engine to seek the index. For instance:

select MIN(id) from test where id is not null

Use existing algorithm but add additional index
This approach applies to the environment that when you are maintaining a third party software where code change will violate service agreements.

  • Change/Add index with filter where id is not null
  • create an index view where id is not null. The indexes of the view can be utilized even your query is not referencing to the index view.

This is brought you by John Huang, http://www.sqlnotes.info


John Huang’s Blog

John Huang, Microsoft Certified Master in SQL Server, SQL Server MVP, is an independent SQL Server consultant in Vancouver BC, Canada. He started using SQL Server for his projects in year 1994. He has architected and implemented many SQL and BI applications serving different industrial areas. He loves talking about SQL Server and discussing SQL Server technologies with others. He blogs at http://www.sqlnotes.info.


Leave a comment on the original post [www.sqlnotes.info, opens in a new window]

Loading comments...