SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

Gotcha! SQL Aggregate Functions and NULL

By Michael Coles, (first published: 2005/07/05)


Recently, I noticed a lot of questions about how SQL aggregate functions handle NULL values on a discussion board for an article here at SQL Server Central Understanding the difference between IS NULL and =NULL discussion board. This article was written to help answer some of those questions by explaining how SQL aggregate functions handle NULLs, and to describe some other "gotcha's" when dealing with SQL aggregate functions. Each section of the article ends with a summary of "gotcha's" to look out for when using these function.

The ANSI SQL-92 Standard defines five aggregate functions (which are also referred to in the SQL-92 standard as set functions.) SQL NULLs receive special internal handling that the database designer/programmer needs to be aware of when using these functions. This article describes how ANSI SQL-92 aggregate functions handle NULLs, per the ANSI SQL-92 standard.

I reference the ANSI SQL-92 Standard throughout the article, and the sample code requires the Microsoft SQL Server Northwind Sample Database to be run. All samples were run on SQL Server 2000.

Aggregate Functions

ANSI SQL-92 defines five aggregate functions. For all set functions that specify a column name, the following process takes place behind the scenes:

  1. The search expression (the WHERE clause) is applied.
  2. The value expression (scalar functions on the column like COALESCE()) is applied to each row of the results of Step 1.
  3. NULL values are eliminated from our results*.

*NOTE:  COUNT(*) is a special case, which we describe in detail below.

ANSI SQL-92 specifies that a warning message will be returned when NULLs are eliminated by set functions. You can turn on this option by using the SET ANSI_WARNINGS ON command. The SET ANSI_WARNINGS ON option returns the following error message "Warning: Null value is eliminated by an aggregate or other set operation."

The SET ANSI_WARNINGS ON also affects how SQL Server handles division by zero, arithmetic overflow and character/binary data truncation behavior. Before changing your SET ANSI_WARNINGS option, be sure to reference the following MSDN article: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_set-set_6d2r.asp

Gotcha:  SET ANSI_WARNINGS ON affects several aspects of SQL Server behavior.  Be sure to check http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_set-set_6d2r.asp before changing this option.


The COUNT() set function has two forms:

  • COUNT(column) – Returns the count of rows in a table, less NULLs
  • COUNT(*) – Returns the cardinality (total number of rows) of a table

The following code demonstrates the difference between COUNT(column) and COUNT(*):

SELECT COUNT(*) AS TotalRows, COUNT(region) AS NonNULLRows, COUNT(*) - COUNT(region) AS NULLRows
FROM [Northwind].[dbo].[suppliers]

This query returns the following result:

  1. TotalRows is COUNT(*), the cardinality (total number of rows) of the suppliers table
  2. NonNULLRows is COUNT(region), the total number of rows of the suppliers table in which region is not NULL
  3. NULLRows is COUNT(*) – COUNT(region), the difference between the cardinality of the suppliers table and the number of rows in which region is not NULL; basically this returns the number of rows in which region *is* NULL.

The fact that COUNT(*) and COUNT(column) treat NULLs differently can be used to your advantage in many situations such as calculating the NULLRows column above.

Notice that we can get the NULLRows count from above with the following query as well:

FROM [Northwind].[dbo].[suppliers]

This will not work if you specify the region column in your count, however. Here we'll take a look at why it is important to understand the order in which SQL performs the steps in a SELECT statement that contains an aggregate function. This query returns zero as the NULLRows count:

FROM [Northwind].[dbo].[suppliers]

Why does this query return zero? Remember, by definition COUNT(column) applies the value expression, the search expression and then eliminates NULLs before giving you a final count! Here's a play-by-play of what happens in our example:

  1. The WHERE clause eliminates every row in which region is not NULL.
  2. The value expression in this example is simply a column name - region.
  3. The COUNT(region) function eliminates every row in which region is NULL.
  4. Since we've explicitly eliminated all non-NULL columns with the WHERE clause and implicitly eliminated all NULL columns with the COUNT(region) statement, we end up with a zero rows.
Gotcha:  COUNT(column) eliminates NULL values; COUNT(*) returns the cardinality (total number of rows) for an entire table.  The order in which SQL performs the steps in aggregate calculations can have implications for your results. 


SUM(), like the other set functions, applies the search expression, then the value expression, eliminates NULLs and finally returns a sum of the remaining rows. It is important to note that SUM() excludes NULLs, as NULL scalar arithmetic normally returns NULL. For instance, let's do a scalar addition of all the values in the reportsto column of the employees table:

SELECT 2 + NULL + 2 + 2 + 2 + 5 + 5 + 2 + 5

This SELECT statement will return NULL, since NULL plus anything always returns NULL. If we use the SUM() function on the table, however, we can rest assured that NULLs are eliminated from our result:

SELECT SUM(reportsto) AS RowSum
FROM [Northwind].[dbo].[employees]

The result of this query:

Gotcha:  NULLs are eliminated by the SUM() function; in scalar arithmetic, however, NULL added to anything equals NULL.

AVG() NULL-handling

The AVG() function returns the average of the column, excluding NULLs. There are a couple of "gotcha's" concerning the AVG() function. The first "gotcha" is that NULLs are excluded, as with other set functions, so your answer may not be what you expect. The second "gotcha" is that, by definition, AVG() returns a number of the same scale as the column it is performed on. Again, this could throw your answer off a bit.

We'll tackle the NULL issue first. AVG() specifically excludes NULLs when it calculates averages. The following query divides the sum of the reportsto column by the count of non-NULL rows. In our example, one row is excluded from the calculation:

SELECT AVG(reportsto) AS ColAvg
FROM [Northwind].[dbo].[employees]

Our result is:


This is the same result as if we performed this query:

SELECT SUM(reportsto)/COUNT(reportsto)
FROM [Northwind].[dbo].[employees]

So how does excluding NULLs affect our average? We can use COALESCE() function to see.

SELECT AVG(COALESCE(reportsto, 0)) AS ColAvg
FROM [Northwind].[dbo].[employees]

The COALESCE(reportsto, 0) statement forces AVG() to count NULL rows as zeroes; forcing every single row to be counted. This gives us the following result:


We can get the same result using the SUM() and COUNT() set functions:

SELECT SUM(reportsto)/COUNT(*) AS ColAvg
FROM [Northwind].[dbo].[employees]

Here we are performing a SUM() on the rows that are not NULL, then dividing by the total number of rows, effectively treating the NULL rows as zeroes in our average. In this instance we're dealing with smaller numbers and only nine rows. With larger numbers and more rows the difference is more pronounced.

So which method should you use? That depends on the result you're looking for. Consider an example of calculating the average payroll for a department. Suppose you have the following table:

NameWeekly Pay
Joe M.400.00
Lisa D.500.00

Our average is $450.00 per week. Now let's say that we hire a new guy, but his weekly pay has not been put into the system yet. Now we have:

NameWeekly Pay
Joe M.400.00
Lisa D.500.00

If we include Tom's missing salary in our average (treat it like 0.00), our average goes down to $300.00 per week. If we leave Tom out of it for now – the default for AVG() – we get $450.00 per week again.

So which method is correct? Ignore NULLs or treat them as zeroes in your averages? That depends entirely on your objectives and corporate policies. The main thing is to be consistent in your calculations. If you ignore NULLs in calculating averages, use that same methodology every time; otherwise your results will not be comparable from year to year, quarter to quarter or even day to day.

Gotcha:  AVG() eliminates NULLs when calculating your average. 

AVG() Scale

The second issue is scale. In our examples, the averages we received were calculated and returned with the same scale (digits after the decimal point) as the column; in this case the column is an INT data type, so no decimals were returned. We'll apply the CAST operator to see what would have been returned if we were using a floating-point data type:

FROM [Northwind].[dbo].[employees]

So far, not much has changed. We received 3 previously and 3.0 this time. So let's look at the equivalent average using SUM() and COUNT():

SELECT CAST(SUM(reportsto) AS FLOAT)/COUNT(reportsto) AS ColAvg
FROM [Northwind].[dbo].[employees]

Notice that we are forcing floating point precision now by casting the reportsto column as a FLOAT.  We can get this same result by moving the CAST inside the AVG() function, like this:

SELECT (AVG(CAST(reportsto AS FLOAT))) AS ColAvg
FROM [Northwind].[dbo].[employees]

So how did we end up with a different value the second time? Since AVG() calculates using the same scale as the column data type, our INT column average is calculated with no decimal places. In the first example, we CAST the result of the AVG() function to a FLOAT type, but the AVG() was calculated as an INT first.

In the second example, we CAST the SUM() of the reportsto column to a floating point type, and then divided; so we didn't lose any digits after the decimal point in the calculation. As shown, we can get the same result by using CAST on the column inside the AVG() function.  This is particularly important to note in financial and scientific calculations, where the scale is particularly important to our calculations.

If your scale requirements are greater than the scale of your column, you might need to use the SUM(column)/COUNT(column) method of calculating your average and CAST it to a floating or fixed-point type, or CAST inside of the AVG() function.

Gotcha:  AVG() also uses the precision (total number of digits) and scale (number of digits after the decimal point) of the column you are performing the AVG() on.

MIN(), MAX()

The MIN() and MAX() functions return the minimum and maximum values of a column, respectively. Like the other set functions, these also eliminate NULLs before returning a result.

For numeric data types, MIN() and MAX() simply return the minimum or maximum number in the column. For character data types, the values are compared using standard SQL form: the shorter of the values is right-padded with 'pad characters' (usually spaces). This means that character columns with trailing spaces are mixed in with character columns without trailing spaces. As an example:

IF 'hi' = 'hi '
  PRINT 'Equal'
  PRINT 'Not Equal'

Depending on your collation settings, string comparisons might be case-sensitive or case-insensitive.

Gotcha:  NULLs are eliminated by MIN() and MAX() functions.  For character data types, shorter values are right-padded with 'pad characters' (often spaces) before comparing them.


SQL set functions are very useful in calculating basic statistical values. By knowing how SQL set functions handle NULL values you can ensure that you always get accurate, high-quality results.

Further Reading

Total article views: 86927 | Views in the last 30 days: 60
Related Articles

Adding a column header on count function

Adding a column header on count function


Count Function

Count Function with condition


The COUNT Function in T-SQL

The COUNT function is among the most used functions in the T-SQL codes. Even though COUNT is easy to...


The COUNT function in T-SQL

The COUNT function is among the most used functions in the T-SQL codes. Even though COUNT is easy to...


The COUNT function in T-SQL

The COUNT function is among the most used functions in the T-SQL codes. Even though COUNT is easy to...

advanced querying