Counting Rows

Question

Post reply

Counting Rows

Shiva N

Hall of Fame

Points: 3125
More actions
August 30, 2015 at 12:34 am

#320896

Comments posted to this topic are about the item Counting Rows
Thanks,
Shiva N
Database Consultant

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

TomThomson SSC Guru Points: 104773 More actions · Answer 1

A nice easy question to start the week.

And it shows that T-SQL doesn't always get aggregates on empty sets completely wrong.

Tom

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 2

TomThomson (8/30/2015)
A nice easy question to start the week.
And it shows that T-SQL doesn't always get aggregates on empty sets completely wrong.

"Not always"???

Can you give me an example of where it does go wrong?

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Ed Wagner SSC Guru Points: 287026 More actions · Answer 3

I found this question to be very simple and I was looking for the hidden trick.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

chgn01 SSCarpal Tunnel Points: 4415 More actions · Answer 4

If change count(1) to SUM(id), it return NULL and why?

--------------------------------------
;-)“Everything has beauty, but not everyone sees it.” ― Confucius

DonlSimpson SSCertifiable Points: 7043 More actions · Answer 5

chgn01 (8/31/2015)
If change count(1) to SUM(id), it return NULL and why?

count(1) returns the number of rows in the table which is zero.

sum(id) returns the arithmetic total of the known values in the id columns. since there are no known values the result is unknown (null).

Don Simpson

I'm not sure about Heisenberg.

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 6

akljfhnlaflkj

SSC Guru

Points: 76202

August 31, 2015 at 2:36 pm

#1823471

An easy one, thanks.

Revenant SSC-Forever Points: 42467 More actions · Answer 7

Revenant

SSC-Forever

Points: 42467

August 31, 2015 at 2:57 pm

#1823485

Really simple - thanks, Shiva!

Naidu PK SSCrazy Points: 2481 More actions · Answer 8

I was expecting some trick behind passing variable value from @b-2 to @a.

Easy one. Thanks Siva.

Thanks,
Naveen.
Every thought is a cause and every condition an effect

TomThomson SSC Guru Points: 104773 More actions · Answer 9

Hugo Kornelis (8/31/2015)
TomThomson (8/30/2015)
A nice easy question to start the week.
And it shows that T-SQL doesn't always get aggregates on empty sets completely wrong.
"Not always"???
Can you give me an example of where it does go wrong?

Of course. Try SUM on an empty table.

Please don't bother to argue that the result should be NULL, that would just make you look utterly silly (as silly as the people who produced put this nonsense in teh standard) since it would suggest that you were happy to throw away some rather useful properties of sums, such as (1) adding an integer to a list of non-null integers always adds the value of that integer to the sum of the values in the list, and (2) the sum of a concatenation of lists is the sum of the sums of the individual lists, and so on (including properties that refer to sets as well as to lists, and of course properties referring to bags).

Tom

TomThomson SSC Guru Points: 104773 More actions · Answer 10

chgn01 (8/31/2015)
If change count(1) to SUM(id), it return NULL and why?

Because the people who decided what to do (1) couldn't understand elementary logic and (2) were mathematically illiterate (maybe (2) is a consequence of (1)). It's far from the biggest mistake ever made in computer language design, but a mistake it certainly is. But we're stuck with it now (until the SQL standard changes it or SQL ceases to be used, which may be a very long time).

Tom

Koen Verbeeck SSC Guru Points: 259165 More actions · Answer 11

Easy one, thanks.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 12

TomThomson (8/31/2015)
Hugo Kornelis (8/31/2015)
TomThomson (8/30/2015)
A nice easy question to start the week.
And it shows that T-SQL doesn't always get aggregates on empty sets completely wrong.
"Not always"???
Can you give me an example of where it does go wrong?
Of course. Try SUM on an empty table.
Please don't bother to argue that the result should be NULL, that would just make you look utterly silly (as silly as the people who produced put this nonsense in teh standard) since it would suggest that you were happy to throw away some rather useful properties of sums, such as (1) adding an integer to a list of non-null integers always adds the value of that integer to the sum of the values in the list, and (2) the sum of a concatenation of lists is the sum of the sums of the individual lists, and so on (including properties that refer to sets as well as to lists, and of course properties referring to bags).

First things first - if anyone or anything gets anything wrong, it's definitely not T-SQL. The result of SUM is defined in the ANSI standard for SQL, and faithfully (and hence correct) implemented by T-SQL. Not wrong at all.

So the bigger question is: is the definition wrong? For SUM, I'd say that this is debatable.

Let's start with a few other, simpler ones. Like, for instance, MIN and MAX. When defining how MIN and MAX should be interpreted in relation to missing values (aka NULL), there are two options. Either you treat NULL as a value that is there, yet not known - the logical result would be that MIN and MAX would always return NULL if but a single row in the set is NULL. Or you treat NULL as a missing value (better, because that's how it's defined in the standard anyway!), and disregard it, giving the minimum/maximum of all non-null values. The latter is what the standard defines, and I agree with it - it is more in line with the definition of NULL, and it has more pragmatic value. There is the downside that this is inconsistent with how NULLs behave in "normal" (non aggregate) expressions. But that is handily circumvened by carefully phrasing the definition of MIN and MAX as "the minumum/maximum of all non-NULL values".

Obviously, if a list is empty, or if a list has only NULL values, the MIN or MAX is then applied to an empty list and no result can be returned. Missing result --> NULL. As expected.

The same happens with, for instance, AVG. If I ask you to compute the average of ten values but only give you nine, you cannot answer. But you can give me the average of all values I did give. So AVG is also defined as the average of all non-null values. And for an empty list (or a list with only NULL values), no average can be computed. Again, as expected, NULL.

So why not be consistent and always return NULL from any aggregate if the list is empty? Why the exception for COUNT? The answer is simple: if I ask you how the average weight of all beans in an empty jar, you cannot reply. But if I ask you the number of beans in that same empty rar, you'll reply: zero. NULL would make no sense here.

And that brings us to SUM. Like I said, debatable. If I ask you the total weight of all beans in an empty jar, you'll say zero, because zero beans have zero weight. That, plus your more arithmetic arguments, are a very strong case for returning zero as the SUM of an empty list.

But let's look at the other side. How about consistency? COUNT is different from the other aggregates, because it is the only aggregate function that doesn't look at the data. SUM, like AVG, MIN, and MAX, does look at the data. That would be a point in favor of treating SUM like MAX. Also, within the artihmetical rules of relational databases (ie in sets of data that include NULL for missing values), the standard arithmetical rule of AVG = SUM / COUNT would break if SUM( {} ) would be zero, whereas that rule holds without exception if SUM( {} ) is defined as NULL.

Welcome, Rock! Meet Hard Place. Now choose.

The ANSI standardization committee did just that. Faced with a choice between two wrongs (or, if you prefer positive labeling, two almost-rights), they picked one. You, and many others, disagree with their choice. But guess what? If they had chosen differently, then they would too have been confronted by people convinced that they chose wrong.

For the record, I am not sayiing that they were right, neither that they were wrong. Just that they made a tought choice between two non-perfect solutions.

Bottom line: T-SQL gets it completely right, because it follows the standard. The standard gets it completely right for most aggregates, and takes a debatable position on SUM.

I totally disagree with your statement that "T-SQL doesn't always get (...) completely wrong".

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

TomThomson SSC Guru Points: 104773 More actions · Answer 13

Hugo Kornelis (9/1/2015)
That would be a point in favor of treating SUM like MAX. Also, within the artihmetical rules of relational databases (ie in sets of data that include NULL for missing values), the standard arithmetical rule of AVG = SUM / COUNT would break if SUM( {} ) would be zero, whereas that rule holds without exception if SUM( {} ) is defined as NULL.

That's nonsense. Since division is discovering the unique second factor when given a product and a first factor, it's clear that 0/0 could be absolytely any number, in other words we know nothing about its value: that's Codd's original definition of NULL. Of course x/0 where x is non-zero is not NULL, because we know there is no such value - that's why an error is caused. It turned out to be convenient in ancient history (ie prior to the introduction of NULL) to treat 0/0 as an error too, because in the absence of NULL there was nothing else that the division could return, but that certainly doesn't apply when computing an average by software in a domain where NULL exists as opposed to when doing division by hardware or firmware in a domain where there is no such thing as NULL.

Welcome, Rock! Meet Hard Place. Now choose.

It seems to me to be more a case of extremely solid rock meets an extremely fragile eggshell; and the SQL world decided that the eggshell would break the rock.

The ANSI standardization committee did just that. Faced with a choice between two wrongs (or, if you prefer positive labeling, two almost-rights), they picked one. You, and many others, disagree with their choice. But guess what? If they had chosen differently, then they would too have been confronted by people convinced that they chose wrong.

If they had picked the other choice, they would only have had numerically illiterate people disagreeing with them.

For the record, I am not sayiing that they were right, neither that they were wrong. Just that they made a tought choice between two non-perfect solutions.

I don't believe that the argument that picking zero as the sum of an empty set is "wrong" or even "almost right" stands up to logical analysis.

Bottom line: T-SQL gets it completely right, because it follows the standard. The standard gets it completely right for most aggregates, and takes a debatable position on SUM.
I totally disagree with your statement that "T-SQL doesn't always get (...) completely wrong".

T-SQL gets it right in the sense that it conforms to the SQL standard. However, T-SQL has many things that don't conform to the SQL standard, so why should it conform slavishly to the standard where the standard has somthing really elementary thoroughly, utterly, digracefully wrong? In my view in doing so T-SQl is completely wong.

Tom

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 14

If your response to my argument is to call me numerically illiterate, then we are done.

Please accept my apologies for wasting your and my time.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Counting Rows

Cookies on SQLServerCentral