RE: The Logic, Mathematics, and Utility of NULLs

SSC Guru

Points: 104773

September 10, 2010 at 8:22 pm

#1219598

mtillman-921105 (9/8/2010)
Logic divides arguments into subjects and predicates. A column is usually the subject in a table, and the value for each row the predicate. So, you have a Car table, one of the columns is MPG. For a row, a number 27 in the MPG column would describe the gas mileage of that particular make and model of car for example. The value in the rows become the predicates.
In logic, if one of the values is missing, we simply do not make the claim: "This car gets x MPG" is not very useful, is it?
So in logic, the whole sentence is avoided, which actually makes more sense. In programming on the other hand, it is as if we assume that a value must be there. :ermm:

I'm not sure about that use of "predicate" but that's just terminology so who cares.

Not making the claim - I agree with that. What we see in SQL is something like "where mpg > 26" - and if the database has NULL for the mpg column for a particular row it is not making any claim (other that "I don't have any value for the mpg for this row") about mpg for that row. It's not a question we can avoid though - someone is going to make that query - so all the database can say is " I don't have a value so I can't tell you whether it's greater than 26 or not": in other words, the database's answer for "is it greater than 26" isn't "Yes" and it isn't "No", it's "I haven't a clue". So it has to work in (at least) a three-valued logic, with the three values "True", "False", and "I Don't know".

"It is as if we can assume the value must be there" is a place where lots of people go wrong: we can't make that assumption, it's just nonsense. (I think I'm agreeing with you here.)

The rest of this comment is a [n ex-] mathematical logician's take on Codd's work on having two sorts of NULL. Don't read it if you want to stick in the real world instead of in speculative mathematical theory.

In databases with more than one type of NULL this can get quite messy: if we look at Codd's NULL-i and NULL-a, we find one version where NULL-i means "known to be inapplicable" and NULL-a means "don't have a value and don't know why"; then there's another version where NULL-a means "we don't have a value but we know it's applicable" and NULL-i means "we don't have a value because it's inapplicable" (so there's no way of covering the case "we don't know whether it's applicable or not"; Codd fans - including me, I have to admit, - tend to forget about this gap in that version, which is the version in RM version 2, and prefer either his other version of his 2-NULL system or his 1-NULL system, which is what SQL attempted to implememt and failed). The reason for the RM-2 version was that Codd wanted to be able to reason about some of the cases where it was known that a value was applicable: for example a proposition like "NULL-a < 0 or NULL >= 0" is guaranteed to be True when NULL-a is a marker for a missing value from the integer domain provided NULL_a is guaranteed to signal an applicable value (and, in Codd's RM-2 system, if I'm remembering correctly, "NULL-i < 0 or NULL-i >= 0" was guaranteed to be "False", while "NULL-i < NULL-i or NULL-i >= NULL-i" was none of "True", "False" or "Unknown" but had a fourth truth value "Inapplicable"). All the stuff about the relational calculus engine needing to detect tautologies could be discarded if there were no version of NULL which offered a guarantee of applicability, as could the fourth truth value in the logic component of the relational calculus; and the RM-2 gap in absent-value coverage would disappear; of course the basic problem in the RM-2 NULLs can also be fixed by introducing a 3-rd NULL (to cover the "we don't even know whether it's applicable" case) and keeping RM-2's 4 truth values, but doing that would leave the question of tautology detection still a valid one. But of course there's no imaginable way that Date's nonsense about needing an unbounded set of NULLs as soon as one goes to two can be justified.

Tom