RE: The Logic, Mathematics, and Utility of NULLs

SSC Guru

Points: 104773

August 19, 2010 at 4:07 am

#1208977

Matt Miller (#4) (8/18/2010)

True except that in practice, actually being able to make that distinction is an awfully big step.What do you do WHILE you actually decipher whether the machine got unplugged, or became self-aware and stopped measuring that data you were looking for. Do you then need 3 NULLS? (the original definition AND the two latter ones?)
Perhaps I deal in messy environments, but I find a lot of settings where the only real distinction needed is "is there data there" or "there's no data available". Putting placeholders in data when you just plain don't know and can't know WHY, just to make it "look" prettier, is like putting earrings on a pig....

I agree that the step from 1 to 2 NULLs is a big one but there are possible reasons for wanting it. I see it as assistance for a particular performance engineering bodge (representing several distinct entity types either in a single relation, whether a base relation or a derived relation to save having to write a lot of code and/or run a lot of overcomplex queries - this is a bodge that we sometimes have to live with unless we can afford to write and run ridiculously large amounts of code) plus getting some extra functionality out of NULLs when they crop up in some of the aggregate functions (ideally NULLi contributes nothing to SUM, AVG,MIN, MAX, VAR, STDEV,...; while NULLa causes them to return NULLa) plus a bit of support for applications which care whether something is known to be inapplicable or just not available for some unspecified reason. But all this can be done by adding an applicability bit column for each column that could take NULLi without too much pain in most cases, so I would rarely choose to use two NULLs instead of just one even in a language that allowed me to do so (with SQL I can't use two nulls anyway).

I could see having the option for either definition, but if I had to pick just one, I'd go with the original definition for the reason above.

In the two NULL system you have the option of using the one-NULL system, you simply define your schema so that NULLi is nowhere allowed. One of the "fun" things about Codd's two-null system is realising that the two NULLs do, in a sense, overlap. Nulla means the value is not here and the database doesn't say why; Nulli means the value is not here and the database says that it's not here because the column is inapplicable. So an inapplicable case could be represented by NULLa instead of NULLi - either because whoever entered it didn't know whether it was applicable or not or because the database designer decided to keep things simple and not represent that information about the reason for absence. The designer can choose not to use NULLi either in all cases (because of a preference for the single null system) or on a horses for courses basis (only adds NULLi when it's convenient and useful). So a schema designer can work in the two NULLs system and use only NULLa so that he gets the same result as he would in Codd's one NULL system, or use only NULLi (if he's so lucky that he doesn't find he needs NULLa, which rarely happens to me) and get a different single NULL system, or use the two-nulls system where appropriate. Of course in SQL the designer is stuck in a botched one-null system where NULL sometimes behaves as NULLa and sometimes as NULLi (when MAX or SUM is used on a nonempty set, for example) and is sometimes introduced instead of a correct result ("select sum(T.x) from T where T.x>T.x" should return 0 on any rational system, but returns NULL).

Tom