• Hugo Kornelis (8/22/2013)


    So where does all the confusion with "unknown" come from? (To my surprise, I see that even Tom, a man I normally agree with on most points, is confused by this. Understandable!)

    I'm not at all confused by this, but I have the impression that you are.

    Let's have a database of the truth values of certain propositions at about midday each day for which a test is scheduled for the corresponding proposition. One such proposition may be "XYZ Ordinary Shares have a higher value in the London stock market than ABC Ordinary Shares"; lets call that proposition "Prop1"; we have a table created like this:

    CREATE TABLE Props(

    Propname varchar(16) NOT NULL,

    Testdate date NOT NULL,

    tv varchar(8) NULL check(tv IS NULL or TV in ('True', 'False' ,'Unknown'),

    constraint PK_Props primary key (Testdate, Propname)

    );

    Now if we run the query

    SELECT tv FROM Props WHERE Propname = 'Prop1' and date = 20130821'';

    there are five things it can do, three of which involve returning a value from the domain varchar(0) NOT NULL check(tv IS NULL or TV in ('True', 'False' ,'Unknown'), the fourth returns the additional value in the extended domain varchar(8) NULL check(tv IS NULL or TV in ('True', 'False' ,'Unknown'), and the fifth returns nothing at all.

    1) return 'True'

    2) return 'False'

    3) return 'Unknown' because when that proposition was tested on that date one (or both) one (or both) of the values was not present in the database

    4) return NULL because the record with that primary key has NULL in the tv column, perhaps because the test was not, for some reason, carried out that day although it was scheduled.

    5) return nothing because no row has the primary key specified in the where clause (that proposition was not scheduled to be tested that day).

    In case 4, the database does not know what the truth value is; it's not known. However, it is NOT 'Unknown'.

    Your exposition further on in the message containing the above quotation claims to prove that NULL can't mean unknown because it isn't the truth value "Unknown". That is just a confusion of two very different concepts, an unjustifiable conflation of data with metadata. The truth value unknown is not a possible value of any attribute of an entity unless the domain of that attribute includes the truth values of a non-classical logic which includes unknown; most domains used in databases don't include those values, so it is very unusual for that truth value to be a possible result and even when it is possible NULL does not indicate that truth value. However, it is certainly possible that the real world value of some attribute is not known, and this is indicated by marking its database representation NULL. And if a value in the database is marked NULL, it is quite clear that so far as the database is concerned the value of the real world attribute isn't known - and "it isn't known" is just another way of saying "it is unknown".

    I suspect that your attempt to say that a NULL mark doesn't mean the value is unknown arises from a desire to stop people confusing the truth value unknown with the meaning of the NULL mark - but unless the domain of the value includes that truth value, nothing in the domain can be that truth value; the database representation of an attribute as NULL certainly does mean that so far as the database is concerned we don't know what value the attribute has, because we haven't recorded it. I believe it would be easier to stop people confusing the truth value and the significance of the mark if people stopped claiming that the mark doesn't imply a lack of knowledge (in the database) of the value of the real-world attribute that it marks, and explained instead the difference between the truth value's significance and the mark's significance without making such claims.

    Incidentally, I reckon Date's "proof" that going to 2 sorts of NULL forced you to 3, 4 and so on ad infinitum was complete hogwash, so that appears to be something else we disagree on; but I think Codd was wrong in advocating two different NULLs in some of his papers, so that is something we agree on.

    Tom