Gotcha! SQL Aggregate Functions and NULL

  • Hello Mike,

    this is a very good article, everything correctly describes issues I have to deal with almost every day when working with reports.

    It would be a good idea if you would like to write a part 2 dealing with joins and nulls. The example may be when you have customers and orders and you would like to return all customers (or a count of customers) who did not place the orders. Sometimes these queries could be complex, especially when the specific order is required.

    Regards,Yelena Varsha

  • Hi Steve and Yelena,

    Thanks for the feedback!  I'll look into writing an article on JOINs in the future.  I'm actually working on a couple of other articles now, but I'll draft up a couple of ideas for later.  My feeling is that you can't over-explain JOINs.  Especially now that SQL 2005 kicks it up a notch with CTE's.

    Thanks again

  • To expand a bit on the count(*) versus count(1) discussion, what does count(1) mean? 

    The fact that count(1) returns the same value as count(*) is an artifact of the parser and the query optimizer.  In fact, count(0), count(2), and count(20) also return the same value on both of the SQL implementations I've tested this morning.

    I suspect that the suggestion that count(1) is faster than count(*) came from a poorly constructed performance test on some version of some SQL implementation.


    Cordially,

    Keith W. Hare

  • The point I wanted to make was that "select count(x) from table" gives you the number of non-null values returned by the statement "select x from table". This applies when x is a column and also when x is something else (except *). So count(*) isn't so special after all, you may think of it as nothing more than count(1) (or count(0) if you prefer). I certainly don't claim it is faster...

    I acknowledge that count(*) is part of the ansi syntax, as Mike also points out. I don't want to throw away count(*), only point out that SQL (on SQL Server 2000) is equally powerful without it.

     

  • The point I'm making is that COUNT(1) is not defined as standard syntax, COUNT(*) is.  COUNT(1) [or COUNT(100), etc.] doesn't have a standard behavior defined, so that its behavior could change or even be removed in future versions of SQL Server...  That being the case, I don't see COUNT(constant) as being powerful...  I think it's probably more along the lines of a proprietary undocumented feature that should probably be avoided in favor of well-documented/standard features and syntax.  I think "more powerful" implies that COUNT(constant) achieves something above and beyond what can be done with the standard ANSI COUNT(*) syntax...  I don't see that as the case though...

  • As I tried to say in my previous post, I think I understand your point.... I only wanted to say that on SQL Server 2000 you don't lose anything (except perhaps performance) by not using count(*) (the quote "more powerful" must be from somewhere else ). I think expressions such as count(1) or count('') or even count(case when column = 1 then null else 1 end) are pretty straightforward (but not necessarily very useful) generalizations of the count(column) syntax, and I would be very surprised if this behaviour changes in future versions of SQL Server (no guarantees, however ). Personally I stick with count(*) in queries but sometimes think of it as count(1). This could very well be because I have never read the ansi standard and have never worked with other database systems...

     

  • Oops you're right, you said SQL was "just as powerful" without COUNT(*), not "more powerful".  Sorry about that.  Because COUNT(constant) doesn't have a guaranteed behavior, it's possible that ANSI could define a specific behavior for it in the future that differs from the MS implementation; or they could define that it should throw an exception, or any other number of strange things that could break code down the road.

    On another topic, one of the items I should have mentioned in the article is COUNT(DISTINCT column), which is also defined by ANSI.  This variation returns the count of DISTINCT rows for a particular column.  Like COUNT(column), this one eliminates NULLs before returning the result.  I think this variation tends to get less use than it's cousins COUNT(*) and COUNT(column), but it can be useful on occasion.

    Thanks again.

  • I ran into a snag on the SUM() section, especially the statement "we can rest assured that NULLs are eliminated from our result"

    I tried to use SUM() as suggested but it returns NULLS.

    I then tried to use a CASE statement like this (but it failed to eliminate NULLS as well):

    SELECT SUM(CASE WHEN someField IS NULL THEN 0) AS SumOfSomeField FROM someTable

    To get it to work I needed to change the CASE logic like this:

    SELECT SUM(CASE WHEN someField IS NOT NULL THEN someField ELSE 0) AS SumOfSomeField FROM someTable

    I can't understand why T-SQL behaves this way. But then again it is a Microsoft product so I'm not surprised

  • The only time (that I know of) that SQL returns NULLs with SUM() is if there are no rows to SUM.  This is the case if: 1) There are no rows in your table, and 2) If all the rows you are SUMming are NULL [in which case they are eliminated, and it's the same thing as SUMming no rows].

    The SUM of no rows is not a known value.  AFAIK that's standard ANSI procedure.  The only other thing I can think of without seeing your table and your test data, is that you might have set some strange non-ANSI NULL feature.) 

    BTW, how does this work out for you:

    SELECT SUM(COALESCE(somefield, 0)) FROM sometable

  • Hi Mike ,

    happen to stumble across these articles of yours..they are really informative & well explained.

    thanks for making the technology a better tool to work with...

    Priyaranjan Mohanty

  • Thank you Priyaranjan, I appreciate the feedback!

  • greate one , very useful for aggregate function,

    thanks

  • Very good! Thank you.

    I was waiting for someone to address the fact that nulls are grouped together by a group by clause -- a special case exception to the "null is never equal null" rule? (Not equal, just the same as...)

  • Thanks Jim. Not only do NULLs group together, but they are considered "not distinct" when performing set-based operations like UNION, EXCEPT, INTERSECT, and when retrieving DISTINCT rows: http://www.sqlservercentral.com/articles/Advanced+Querying/2829/[/url].

  • Mike,

    I've started reading your article and am not clear why the query: SELECT COUNT(region) FROM Suppliers WHERE Region IS NOT NULL returns 0

    1. WHERE clause eliminates every row in which region is not NULL.

    Therefore, I only have 9 rows where Region IS Not Null

    3. The COUNT(region) function eliminates every row in which region is NULL.

    I would still expect 9 rows still to be returned back after the select query is parsed.

    If you could please elaborate on why 0 rows are returned back from the query.

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply