Approximate COUNT DISTINCT

, 2019-01-15 (first published: )

We all have written queries that use COUNT DISTINCT to get the unique number of non-NULL values from a table. This process can generate a noticeable performance hit especially for larger tables with millions of rows. Many times, there is no way around this. To help mitigate this overhead SQL Server 2019 introduces us to approximating the distinct count with the new APPROX_COUNT_DISTINCT function. The function approximates the count within a 2% precision to the actual answer at a fraction of the time.

Let’s see this in action.

In this example, I am using the AdventureworksDW2016CTP3 sample database which you can download here

SET STATISTICS IO ON
SELECT COUNT(DISTINCT([SalesOrderNumber])) as DISTINCTCOUNT
FROM [dbo].[FactResellerSalesXL_PageCompressed]

SQL Server Execution Times:  CPU time = 3828 ms,  elapsed time = 14281 ms.

SELECT APPROX_COUNT_DISTINCT ( [SalesOrderNumber]) as APPROX_DISTINCTCOUNT
FROM [dbo].[FactResellerSalesXL_PageCompressed]

SQL Server Execution Times: CPU time = 7390 ms,  elapsed time = 4071 ms.

You can see the elapsed time is significantly lower! Great improvement using this new function.

The first time I did this, I did it wrong. A silly typo with a major result difference. So take a moment and learn from my mistake.

Note that I use COUNT(DISTINCT(SalesOrderNumber) ) not DISTINCT COUNT (SalesOrderNumber ). This makes all the difference. If you do it wrong the numbers will be way off as you can see from the below result set.  You’ll also find that the APPROX_DISTINCTCOUNT will return much slower then the Distinct Count which is not expected. 

Remember COUNT(DISTINCT expression) evaluates the expression for each row in a group, and returns the number of unique, non-null values, which is what APPROX_COUNT_DISTINCT does. DISTINCT COUNT (expression) just returns a row count of the expression, there is nothing DISTINCT about it. 

Always fun tinkering with something new!


Rate

Share

Share

Rate

Related content

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

2009-02-23

1,567 reads

Networking - Part 4

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

2009-02-13

360 reads