• David Portas (7/17/2011)


    I think it is important to be clear on definitions given what you said in your previous article about 1NF permitting nulls. If a reader mistakenly thinks that 2NF and other normal forms apply equally well to a table with nulls as to a table without them then they may want to know what a functional dependency means in a table with nulls.

    Actually I think it is far more important to explain the fundamental underlying concept - that the business rules so constrain values that knowing values for some attributes guarantees that you also know the values for some others - than to mess about with the mathematical abstraction "functional dependency", which has no meaning at all to almost everyone involved in database development. That underlying concept is what matters in the real world of database design. You may be one of those people who think better in terms of mathematical abstractions than in terms of the real world phenomena they model - if so, you are not part of the audience I was writing for. I don't want to tell my audience that they have to learn a lot of pure mathematics to understand normalisation (which is NOT part of relational theory, but part of real world database design). Nor do I want to arrive at a situation where my audience ends up, by the end of the article series, foolishly believing that maximum decomposition is more imprtant than accurate representation (so you'll really hate at least one of the later articles in the series). Nor do I want my audience to believe that normalisation can't be applied to their real-world databases where missing values are inevitable so they have NULLs, because that would deprive them of a valuable tool-kit for improving their schema designs. Perhaps you will understand my position better if I use an analogy: "look, we want to keep the baby - so be careful when you discard the bathwater"?

    While "permit NULL only when it is absolutely necessary (and be extremely sure that it really is absolutely necessary)" is a good rule that every schema designer should follow, "forbid NULL everywhere" is nonsense perfectionism (at least until we have better ideas for NULL elimination - a research topic which anti-NULL purists have ignored ever since Date's bizarre "default values will always work in place of NULL" idea) of the same order as "make all your schemas conform to DKNF" (which is, as Date pointed out here, starting about 20 lines down, provably impossible in many practical cases; in fact we don't even approach understanding where it is possible).

    I will take Codd's word for it when he says "The concepts and rules of functional dependence, multi-valued dependence, and join dependence were all developed without considering missing db-values. All the normal forms based on these dependencies were also developed without considering missing dbvalues."[1]. Whether or not nulls were actually in his mind at the time, he chose to leave them out of what he published. Codd's goes on to say:

    "the normalization concepts do NOT apply, and should NOT be applied, globally to those combinations of attributes and tuples containing [null] marks. Instead, they should be applied [...] to a conceptual version of the database in which tuples containing missing-but-applicable information in the pertinent attributes have been removed"

    I don't mean to imply that I agree wtih Codd's view. I think it's a terrible suggestion. It completely ignores the very real possibility that the kind of redundancy and resulting anomalies which the NFs were designed to eliminate can also occur when some rows include nulls. What bothers me is that you didn't mention either this or any other strategy for dealing with the nulls that you say you want to permit. Your readers are left in the dark about it.

    Then what do you mean to imply? That I should provide about 16000 words of abstruse mathematics to demonstrate the several ways (one of which is the way Codd mentioned and you apparently failed to notice in the quotation you used - you apparently also failed to notice that the informal approach in the article exactly followed that way except for an insistence on banning NULL in all prime attributes, which I think Codd probably intended although he didn't actually mention it that sentence) in which the formal theory can be adapted, instead of sticking to the actual design issues that the formal theory is intended to deal with and model? When I was very young I might have taken that approach - I was a pure mathematician - but since then I have become an applied mathematican/computer scientist/engineer and gained experience by applying and teaching people to apply mathematics in industry; this has taught me that going back to the real world things that are being modelled and talking about them is where I need to be if I want to teach effectively. It has also taught me that I understand the mathematics much better and have far more insight into the abstract theory if I approach it that practical way.

    Anyway, I don't think it's useful to try to hijack this discussion (as you have successfully hijacked others) into a debate on whether NULL is a bad thing. We know it's a bad thing. That it's also a necessary thing is something that is unfortunate, and it's even more unfortunate that some people can't accept that we must live with that necessity.

    Tom