August 30, 2015 at 12:34 am
Comments posted to this topic are about the item Counting Rows
Thanks,
Shiva N
Database Consultant
August 30, 2015 at 10:05 am
A nice easy question to start the week.
And it shows that T-SQL doesn't always get aggregates on empty sets completely wrong.
Tom
August 30, 2015 at 11:47 pm
This was removed by the editor as SPAM
August 31, 2015 at 1:13 am
TomThomson (8/30/2015)
A nice easy question to start the week.And it shows that T-SQL doesn't always get aggregates on empty sets completely wrong.
"Not always"???
Can you give me an example of where it does go wrong?
August 31, 2015 at 5:55 am
I found this question to be very simple and I was looking for the hidden trick. 😛
August 31, 2015 at 9:06 am
If change count(1) to SUM(id), it return NULL and why?
--------------------------------------
;-)“Everything has beauty, but not everyone sees it.” ― Confucius
August 31, 2015 at 11:45 am
chgn01 (8/31/2015)
If change count(1) to SUM(id), it return NULL and why?
count(1) returns the number of rows in the table which is zero.
sum(id) returns the arithmetic total of the known values in the id columns. since there are no known values the result is unknown (null).
Don Simpson
August 31, 2015 at 2:36 pm
An easy one, thanks.
August 31, 2015 at 2:57 pm
Really simple - thanks, Shiva!
August 31, 2015 at 5:31 pm
Hugo Kornelis (8/31/2015)
TomThomson (8/30/2015)
A nice easy question to start the week.And it shows that T-SQL doesn't always get aggregates on empty sets completely wrong.
"Not always"???
Can you give me an example of where it does go wrong?
Of course. Try SUM on an empty table.
Please don't bother to argue that the result should be NULL, that would just make you look utterly silly (as silly as the people who produced put this nonsense in teh standard) since it would suggest that you were happy to throw away some rather useful properties of sums, such as (1) adding an integer to a list of non-null integers always adds the value of that integer to the sum of the values in the list, and (2) the sum of a concatenation of lists is the sum of the sums of the individual lists, and so on (including properties that refer to sets as well as to lists, and of course properties referring to bags).
Tom
August 31, 2015 at 5:40 pm
chgn01 (8/31/2015)
If change count(1) to SUM(id), it return NULL and why?
Because the people who decided what to do (1) couldn't understand elementary logic and (2) were mathematically illiterate (maybe (2) is a consequence of (1)). It's far from the biggest mistake ever made in computer language design, but a mistake it certainly is. But we're stuck with it now (until the SQL standard changes it or SQL ceases to be used, which may be a very long time).
Tom
September 1, 2015 at 3:59 am
Easy one, thanks.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
September 1, 2015 at 1:06 pm
TomThomson (8/31/2015)
Hugo Kornelis (8/31/2015)
TomThomson (8/30/2015)
A nice easy question to start the week.And it shows that T-SQL doesn't always get aggregates on empty sets completely wrong.
"Not always"???
Can you give me an example of where it does go wrong?
Of course. Try SUM on an empty table.
Please don't bother to argue that the result should be NULL, that would just make you look utterly silly (as silly as the people who produced put this nonsense in teh standard) since it would suggest that you were happy to throw away some rather useful properties of sums, such as (1) adding an integer to a list of non-null integers always adds the value of that integer to the sum of the values in the list, and (2) the sum of a concatenation of lists is the sum of the sums of the individual lists, and so on (including properties that refer to sets as well as to lists, and of course properties referring to bags).
First things first - if anyone or anything gets anything wrong, it's definitely not T-SQL. The result of SUM is defined in the ANSI standard for SQL, and faithfully (and hence correct) implemented by T-SQL. Not wrong at all.
So the bigger question is: is the definition wrong? For SUM, I'd say that this is debatable.
Let's start with a few other, simpler ones. Like, for instance, MIN and MAX. When defining how MIN and MAX should be interpreted in relation to missing values (aka NULL), there are two options. Either you treat NULL as a value that is there, yet not known - the logical result would be that MIN and MAX would always return NULL if but a single row in the set is NULL. Or you treat NULL as a missing value (better, because that's how it's defined in the standard anyway!), and disregard it, giving the minimum/maximum of all non-null values. The latter is what the standard defines, and I agree with it - it is more in line with the definition of NULL, and it has more pragmatic value. There is the downside that this is inconsistent with how NULLs behave in "normal" (non aggregate) expressions. But that is handily circumvened by carefully phrasing the definition of MIN and MAX as "the minumum/maximum of all non-NULL values".
Obviously, if a list is empty, or if a list has only NULL values, the MIN or MAX is then applied to an empty list and no result can be returned. Missing result --> NULL. As expected.
The same happens with, for instance, AVG. If I ask you to compute the average of ten values but only give you nine, you cannot answer. But you can give me the average of all values I did give. So AVG is also defined as the average of all non-null values. And for an empty list (or a list with only NULL values), no average can be computed. Again, as expected, NULL.
So why not be consistent and always return NULL from any aggregate if the list is empty? Why the exception for COUNT? The answer is simple: if I ask you how the average weight of all beans in an empty jar, you cannot reply. But if I ask you the number of beans in that same empty rar, you'll reply: zero. NULL would make no sense here.
And that brings us to SUM. Like I said, debatable. If I ask you the total weight of all beans in an empty jar, you'll say zero, because zero beans have zero weight. That, plus your more arithmetic arguments, are a very strong case for returning zero as the SUM of an empty list.
But let's look at the other side. How about consistency? COUNT is different from the other aggregates, because it is the only aggregate function that doesn't look at the data. SUM, like AVG, MIN, and MAX, does look at the data. That would be a point in favor of treating SUM like MAX. Also, within the artihmetical rules of relational databases (ie in sets of data that include NULL for missing values), the standard arithmetical rule of AVG = SUM / COUNT would break if SUM( {} ) would be zero, whereas that rule holds without exception if SUM( {} ) is defined as NULL.
Welcome, Rock! Meet Hard Place. Now choose.
The ANSI standardization committee did just that. Faced with a choice between two wrongs (or, if you prefer positive labeling, two almost-rights), they picked one. You, and many others, disagree with their choice. But guess what? If they had chosen differently, then they would too have been confronted by people convinced that they chose wrong.
For the record, I am not sayiing that they were right, neither that they were wrong. Just that they made a tought choice between two non-perfect solutions.
Bottom line: T-SQL gets it completely right, because it follows the standard. The standard gets it completely right for most aggregates, and takes a debatable position on SUM.
I totally disagree with your statement that "T-SQL doesn't always get (...) completely wrong".
September 3, 2015 at 12:36 pm
Hugo Kornelis (9/1/2015)
That would be a point in favor of treating SUM like MAX. Also, within the artihmetical rules of relational databases (ie in sets of data that include NULL for missing values), the standard arithmetical rule of AVG = SUM / COUNT would break if SUM( {} ) would be zero, whereas that rule holds without exception if SUM( {} ) is defined as NULL.
That's nonsense. Since division is discovering the unique second factor when given a product and a first factor, it's clear that 0/0 could be absolytely any number, in other words we know nothing about its value: that's Codd's original definition of NULL. Of course x/0 where x is non-zero is not NULL, because we know there is no such value - that's why an error is caused. It turned out to be convenient in ancient history (ie prior to the introduction of NULL) to treat 0/0 as an error too, because in the absence of NULL there was nothing else that the division could return, but that certainly doesn't apply when computing an average by software in a domain where NULL exists as opposed to when doing division by hardware or firmware in a domain where there is no such thing as NULL.
Welcome, Rock! Meet Hard Place. Now choose.
It seems to me to be more a case of extremely solid rock meets an extremely fragile eggshell; and the SQL world decided that the eggshell would break the rock.
The ANSI standardization committee did just that. Faced with a choice between two wrongs (or, if you prefer positive labeling, two almost-rights), they picked one. You, and many others, disagree with their choice. But guess what? If they had chosen differently, then they would too have been confronted by people convinced that they chose wrong.
If they had picked the other choice, they would only have had numerically illiterate people disagreeing with them.
For the record, I am not sayiing that they were right, neither that they were wrong. Just that they made a tought choice between two non-perfect solutions.
I don't believe that the argument that picking zero as the sum of an empty set is "wrong" or even "almost right" stands up to logical analysis.
Bottom line: T-SQL gets it completely right, because it follows the standard. The standard gets it completely right for most aggregates, and takes a debatable position on SUM.
I totally disagree with your statement that "T-SQL doesn't always get (...) completely wrong".
T-SQL gets it right in the sense that it conforms to the SQL standard. However, T-SQL has many things that don't conform to the SQL standard, so why should it conform slavishly to the standard where the standard has somthing really elementary thoroughly, utterly, digracefully wrong? In my view in doing so T-SQl is completely wong.
Tom
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply