Gotcha! SQL Aggregate Functions and NULL

Question

Gotcha! SQL Aggregate Functions and NULL

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply

Yelena Varshal SSC-Dedicated Points: 34288 More actions · Answer 1

Hello Mike,

this is a very good article, everything correctly describes issues I have to deal with almost every day when working with reports.

It would be a good idea if you would like to write a part 2 dealing with joins and nulls. The example may be when you have customers and orders and you would like to return all customers (or a count of customers) who did not place the orders. Sometimes these queries could be complex, especially when the specific order is required.

Regards,Yelena Varsha

Mike C SSC-Insane Points: 23224 More actions · Answer 2

Hi Steve and Yelena,

Thanks for the feedback! I'll look into writing an article on JOINs in the future. I'm actually working on a couple of other articles now, but I'll draft up a couple of ideas for later. My feeling is that you can't over-explain JOINs. Especially now that SQL 2005 kicks it up a notch with CTE's.

Thanks again

Keith W. Hare SSC Enthusiast Points: 105 More actions · Answer 3

To expand a bit on the count(*) versus count(1) discussion, what does count(1) mean?

The fact that count(1) returns the same value as count(*) is an artifact of the parser and the query optimizer. In fact, count(0), count(2), and count(20) also return the same value on both of the SQL implementations I've tested this morning.

I suspect that the suggestion that count(1) is faster than count(*) came from a poorly constructed performance test on some version of some SQL implementation.

Cordially,

Keith W. Hare

Jesper-244176 SSCertifiable Points: 7142 More actions · Answer 4

The point I wanted to make was that "select count(x) from table" gives you the number of non-null values returned by the statement "select x from table". This applies when x is a column and also when x is something else (except *). So count(*) isn't so special after all, you may think of it as nothing more than count(1) (or count(0) if you prefer). I certainly don't claim it is faster...

I acknowledge that count(*) is part of the ansi syntax, as Mike also points out. I don't want to throw away count(*), only point out that SQL (on SQL Server 2000) is equally powerful without it.

Mike C SSC-Insane Points: 23224 More actions · Answer 5

The point I'm making is that COUNT(1) is not defined as standard syntax, COUNT(*) is. COUNT(1) [or COUNT(100), etc.] doesn't have a standard behavior defined, so that its behavior could change or even be removed in future versions of SQL Server... That being the case, I don't see COUNT(constant) as being powerful... I think it's probably more along the lines of a proprietary undocumented feature that should probably be avoided in favor of well-documented/standard features and syntax. I think "more powerful" implies that COUNT(constant) achieves something above and beyond what can be done with the standard ANSI COUNT(*) syntax... I don't see that as the case though...

Jesper-244176 SSCertifiable Points: 7142 More actions · Answer 6

As I tried to say in my previous post, I think I understand your point.... I only wanted to say that on SQL Server 2000 you don't lose anything (except perhaps performance) by not using count(*) (the quote "more powerful" must be from somewhere else ). I think expressions such as count(1) or count('') or even count(case when column = 1 then null else 1 end) are pretty straightforward (but not necessarily very useful) generalizations of the count(column) syntax, and I would be very surprised if this behaviour changes in future versions of SQL Server (no guarantees, however ). Personally I stick with count(*) in queries but sometimes think of it as count(1). This could very well be because I have never read the ansi standard and have never worked with other database systems...

Mike C SSC-Insane Points: 23224 More actions · Answer 7

Oops you're right, you said SQL was "just as powerful" without COUNT(*), not "more powerful". Sorry about that. Because COUNT(constant) doesn't have a guaranteed behavior, it's possible that ANSI could define a specific behavior for it in the future that differs from the MS implementation; or they could define that it should throw an exception, or any other number of strange things that could break code down the road.

On another topic, one of the items I should have mentioned in the article is COUNT(DISTINCT column), which is also defined by ANSI. This variation returns the count of DISTINCT rows for a particular column. Like COUNT(column), this one eliminates NULLs before returning the result. I think this variation tends to get less use than it's cousins COUNT(*) and COUNT(column), but it can be useful on occasion.

Thanks again.

Brett Morriss Valued Member Points: 61 More actions · Answer 8

I ran into a snag on the SUM() section, especially the statement "we can rest assured that NULLs are eliminated from our result"

I tried to use SUM() as suggested but it returns NULLS.

I then tried to use a CASE statement like this (but it failed to eliminate NULLS as well):

SELECT SUM(CASE WHEN someField IS NULL THEN 0) AS SumOfSomeField FROM someTable

To get it to work I needed to change the CASE logic like this:

SELECT SUM(CASE WHEN someField IS NOT NULL THEN someField ELSE 0) AS SumOfSomeField FROM someTable

I can't understand why T-SQL behaves this way. But then again it is a Microsoft product so I'm not surprised

Mike C SSC-Insane Points: 23224 More actions · Answer 9

The only time (that I know of) that SQL returns NULLs with SUM() is if there are no rows to SUM. This is the case if: 1) There are no rows in your table, and 2) If all the rows you are SUMming are NULL [in which case they are eliminated, and it's the same thing as SUMming no rows].

The SUM of no rows is not a known value. AFAIK that's standard ANSI procedure. The only other thing I can think of without seeing your table and your test data, is that you might have set some strange non-ANSI NULL feature.)

BTW, how does this work out for you:

SELECT SUM(COALESCE(somefield, 0)) FROM sometable

priyaranjan Mohanty SSC Enthusiast Points: 190 More actions · Answer 10

Hi Mike ,

happen to stumble across these articles of yours..they are really informative & well explained.

thanks for making the technology a better tool to work with...

Priyaranjan Mohanty

Mike C SSC-Insane Points: 23224 More actions · Answer 11

Mike C

SSC-Insane

Points: 23224

September 26, 2006 at 7:10 pm

#662610

Thank you Priyaranjan, I appreciate the feedback!

mayurs SSC Enthusiast Points: 186 More actions · Answer 12

greate one , very useful for aggregate function,

thanks

Jim Russell-390299 SSCrazy Eights Points: 9333 More actions · Answer 13

Very good! Thank you.

I was waiting for someone to address the fact that nulls are grouped together by a group by clause -- a special case exception to the "null is never equal null" rule? (Not equal, just the same as...)

Mike C SSC-Insane Points: 23224 More actions · Answer 14

Thanks Jim. Not only do NULLs group together, but they are considered "not distinct" when performing set-based operations like UNION, EXCEPT, INTERSECT, and when retrieving DISTINCT rows: http://www.sqlservercentral.com/articles/Advanced+Querying/2829/%5B/url%5D.

rs80 SSCrazy Points: 2312 More actions · Answer 15

Mike,

I've started reading your article and am not clear why the query: SELECT COUNT(region) FROM Suppliers WHERE Region IS NOT NULL returns 0

1. WHERE clause eliminates every row in which region is not NULL.

Therefore, I only have 9 rows where Region IS Not Null

3. The COUNT(region) function eliminates every row in which region is NULL.

I would still expect 9 rows still to be returned back after the select query is parsed.

If you could please elaborate on why 0 rows are returned back from the query.