David Portas (6/30/2011)
1. I find this article problematic because it uses terminology and ideas that are not going to be universally understood in the same way by all readers. In particular it uses the term "NULL" without explaining it. Are we really talking about SQL-style nulls here? Relational database theory doesn't include anything like SQL's kind of null. It certainly is not part of Codd's original relational model or even his later revisions of it. There are several contradictory definitions of nulls and Tom hasn't explained which one he is talking about.
It doesn't really matter which sort of nulls we are talking about, as long as they are not used to violate the principles of the relational model. If you want me to be specific, by NULL you can take me to mean the domain-specific BOTTOM value in each domain as used in the dentational semantics for recursive function calculus - ie for all effective computer languages - produced by Dana Scott and Chris Strachey a very long time ago. It really was a very
long time ago - I was less than 30 years old when Scott came to Oxford and teamed up with Strachey.
2. The suggestion that domain theory means that relations can support nulls appears to be speculation on Tom's part. At least that's how it seems, because he hasn't referenced any other source for that claim. I don't recall ever seeing that from any other source.
Heath and Codd were talking about a value to denote absence of information as early as 1971, as part of their efforts to devise a join (which today we would call an outer join) that would include all information from each relation involved. That was during their collaboration on defining normal forms, when Codd decided to allow key elements to be dependent on partial keys in 3NF, while Heath advocated not doing so. That, incidentally, is why Date stated that Heath defined BCNF three years before Boyce got involved (the definition is in one of Heath's 1971 papers). You'll see references in Codd's later work to "private communications" from Heath about the full join problem (for example on page 407 of ACM ToDS 4/4, December 1979). The idea that domains have undefined values was certainly understood in the 1930s and probably well before that - I learnt domain theory from a 1930s mathematical logic text (although it was Scott's work in the 70s that made it a fundamental part of the foundations of computer science) and an Oxford educated mathematician like Codd was certainly familiar with that. His consistent use of the word "domain" in the 1970 paper doesn't suggest that he was trying to avoid the concept.
3. Tom says that "every row [sic] has a primary key". I guess this means every relation has a primary key. Keys are usually understood to be a subset of the attributes of a relation or relation variable rather than a property that tuples have.
Drat! I should not have used that statement. It was very sloppy. I should have said that the relation has a primary key (or at leat one primary key).
4. He goes on to say: "a relation in relational database theory is required to conform to 1NF". This is true of Codd's original relational model but then Codd's original relational model did not support nulls whereas Tom appears to be discussing some other system which does have nulls. Codd's RM2 on the other hand does NOT require every relation to conform to 1NF. Codd's RM2 specifically allows derived relations which do not have primary keys at all. Again, to give readers a chance of understanding his explanations Tom needs to explain what definitions he is referring to.
I really don't want to get into a long discussion of Nulls here - but I will say that I'm fed up with hearing from Date's followers that NULLs were not around until a decade after Codd's 1970 paper. It's absolutely clear that he and Heath discussed them from 1970 onwards (perhaps after the internal IBM paper, but certainly before the first published paper). I think perhaps you have forgotten that first normal form was introduced by Codd to ensure that the relational calculus could use first order predicate calculus: it would not require higher order predicate calculus. Nulls don't have any influence at all on whether a higher order predicate calculus is needed or not.
I am always amused when someone tells me that derived relations have to obey the rules for normal forms - someone recently suggested that derived relations ought to fit 5NF, and it took me some time to stop laughing. I don't think a special case can be made for 1NF for derived relations.
5. The claim that relations require "primary" keys is a bit ambiguous. What Codd referred to as a "primary key" in his 1970 paper is more usually today called a Candidate Key. A "primary" key is nowadays usually understood to be just one of the candidate keys which is somehow designated to be a "primary" one. Depending on the implementation that designation of a primary key may be part of the logical data model in the database or it may not be. There is nothing in the relational model that absolutely requires a key to be so designated. Candidate keys are fundamental but the concept of having a "primary" key or not having one is unimportant.
I was sticking to the original definition (1970) which allows a relation to one of more primary keys. ("A relation may possess more than one nonredundant primary key": CACM 13/6, Dec 1970, page 380). Only one of the primary keys is chosen and called The
primary key. Of course SQL has a restriction that there can be only one primary key. Modern terminology is as you say usually "candidate keys" rather than "primary keys". But the individual attriutes in a candidate key are still "prime" attributes, not "candidate" attributes, so the terminology isn't very pure.
edit: fix quote brackets. edit agin: [quite] didn't work nearly as well as [quote] - stupid copmputer reads what I type instead of what I mean to type.