December 17, 2004 at 7:31 pm
Hi there,
I am using the SUM function to total values in a single column.
The SUM function ignores NULL values, and continues totalling the column. However I need to return a NULL value as the SUM total if the column contains any row NULL values.
Is there a way to do this?
Thanks,
Mark
December 18, 2004 at 10:40 pm
Is this what you mean?
1. Using a table like this:
---------------------------
CREATE TABLE [dbo].[ab_sum_nulls] (
[sum_iid] [int] IDENTITY (1, 1) NOT NULL ,
[item_grp] [char] (10) ,
[item_value] [int] NULL
) ON [PRIMARY]
GO
2. With data like this:
SELECT item_grp, item_value
FROM ab_sum_nulls
------------------
item_grp item_value
A NULL
A 10
A 12
B 0
B 10
B 12
3. The regular SQL sum looks like this:
SELECT item_grp, SUM(item_value) AS raw_sum,
SUM(CASE WHEN item_value IS NULL THEN 1
ELSE 0 END) AS null_ct
FROM ab_sum_nulls
GROUP BY item_grp
--------------------
item_grp raw_sum null_ct
A 22 1
B 22 0
4. Then this gives you what you want:
SELECT item_grp,
CASE WHEN null_ct = 0 THEN raw_sum
ELSE NULL END AS item_sum
FROM (
SELECT item_grp, SUM(item_value) AS raw_sum,
SUM(CASE WHEN item_value IS NULL THEN 1
ELSE 0 END) AS null_ct
FROM ab_sum_nulls
GROUP BY item_grp
) AS sum_qry
--------------
item_grp item_sum
A NULL
B 22
Bob Monahon
December 19, 2004 at 12:51 am
Hi Bob,
Great script! That is exactly what I needed.
I am actually completing the sum as part of a cross table or pivot selection. Using:
SELECT SUM(CASE item_grp WHEN 'A' THEN item_value ELSE 0 END) AS 'Total A',
SUM(CASE item_grp WHEN 'B' THEN item_value ELSE 0 END) AS 'Total B'
FROM ab_sum_nulls
I get:
Total A Total B
----------- -----------
22 22
(1 row(s) affected)
I have combined this with your script:
SELECT SUM(CASE WHEN null_ct = 0 AND item_grp = 'A' THEN raw_sum ELSE NULL END) AS 'Total A',
SUM(CASE WHEN null_ct = 0 AND item_grp = 'B' THEN raw_sum ELSE NULL END) AS 'Total B'
FROM (
SELECT item_grp, SUM(item_value) AS raw_sum, SUM(CASE WHEN item_value IS NULL THEN 1 ELSE 0 END) AS null_ct
FROM ab_sum_nulls
GROUP BY item_grp
  AS sum_qry
to get:
Total A Total B
----------- -----------
NULL 22
(1 row(s) affected)
Do you think there would be a large performance hit running this query on a table of about 1 million rows?
Thanks again for you help.
Mark
December 19, 2004 at 2:18 am
Here is another way, without using a subquery:
SELECT item_grp, CASE WHEN COUNT(item_value)<COUNT(*) THEN NULL ELSE SUM(item_value) END AS raw_sum FROM ab_sum_nulls GROUP BY item_grp
And here is another one, without using a CASE:
SELECT item_grp, NULLIF(1-SIGN(COUNT(*)-COUNT(item_value)),0) *SUM(item_value) AS raw_sum FROM ab_sum_nulls GROUP BY item_grp
(I have to agree, the last query is very cryptic... I wrote it just for fun, trying to get the shortest query that does the job)
All these queries cause a "Warning: Null value is eliminated by an aggregate or other SET operation." If this annoys you, you can use the following query
SELECT item_grp, CASE WHEN SUM(CASE WHEN item_value IS NULL THEN 1 ELSE 0 END)=0 THEN NULL ELSE SUM(ISNULL(item_value,0)) END AS raw_sum FROM ab_sum_nulls GROUP BY item_grp
Regarding the performance, all these queries should have a very similar execution plan: all of them require one table scan. If the table in question has 1 million rows... this would take some time, of course (1 to 3 seconds, maybe).
Razvan
December 19, 2004 at 11:04 am
Hello Mark,
Re: Do you think there would be a large performance hit running this query on a table of about 1 million rows?
Probably not. I ran the query with SHOWPLAN_ALL, and it appears to run a single pass through the data (your million rows), with possibly a second pass through an intermediate result set (the summary by [item_grp]).
I liked Razvan's suggestions; so I compared queries. Here's yours:
SET SHOWPLAN_ALL ON
GO
SELECT SUM(CASE WHEN null_ct = 0 AND item_grp = 'A'
THEN raw_sum ELSE NULL END) AS 'Total A',
SUM(CASE WHEN null_ct = 0 AND item_grp = 'B'
THEN raw_sum ELSE NULL END) AS 'Total B'
FROM (
SELECT item_grp, SUM(item_value) AS raw_sum
, SUM(CASE WHEN item_value IS NULL
THEN 1 ELSE 0 END) AS null_ct
FROM ab_sum_nulls
GROUP BY item_grp
) AS sum_qry
GO
And here's a version based on Razvan's ideas to eliminate the "SUM(CASE" in the inner query:
SET SHOWPLAN_ALL ON
GO
SELECT SUM(CASE WHEN item_ct = row_ct AND item_grp = 'A'
THEN raw_sum ELSE NULL END) AS 'Total A',
SUM(CASE WHEN item_ct = row_ct AND item_grp = 'B'
THEN raw_sum ELSE NULL END) AS 'Total B'
FROM (
SELECT item_grp, SUM(item_value) AS raw_sum
-- Next line causes Warning msg:
-- Null value is eliminated by an aggregate...
, count(item_value) AS item_ct
, count(*) as row_ct
FROM ab_sum_nulls
GROUP BY item_grp
) AS sum_qry
GO
The plans were almost identical, and estimaged CPU and IO were exactly the same. It appears that when you write "COUNT(item_value)", SQL Server actually executes something like this: SUM(CASE WHEN item_value IS NULL THEN 0 ELSE 1). Huh!
The query you're using actually looks to perform a bit better, because the avg. row size is smaller on some of the showplan steps; so for millions of rows you might see a benefit.
Bob Monahon
December 19, 2004 at 10:55 pm
Thanks for you help guys - brilliant!
Mark
Viewing 6 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply