• David Portas (1/22/2010)


    Since it is always possible to decompose further and eliminate nulls they are not formally necessary either - you can always create a truly normalized design without them.

    This is an extremely naive view supported by neither the theory nor the practicalities.

    In order to support that view you would have to insist that every domain had an entity identifier which meant "I'm not an entity identifier but instead an indicator that which entity should be identified when I occur is unknown, although it is certain that there should be such an identifier". That's just a specialised NULL, one of several specialised NULLs that are essential if we want to represent our knowledge of some relations accurately (we can't consider representing the relations themselves accurately unless we assume we have perfect knowledge of them, which will almost always be a really stupid mistake to make, all we can do in practise is represent our knowledge of them).

    Consider a two column table WidgetName with columns "widget_id" and "widget_principal_name_id". Now lets say that we always know whether a widget has or does not have a principal name, and we deal with the case where a widget doesn't have a principla name by not letting that widget's identifier occur in the Widget_id column of the WidgetName table. Also, there are some widgets which we know do have principal names, and we need to represent that knowledge, but we don't know which principal names they have; we can't represnet that by leaving that widget out of the WidgetName table (that would mean it had no principal name), so we represent it by putting the widget_id in the table with the widget_principal_name_id in that row having the value null. Now we can get the principal names of all the widgets for which we know the principal name using an inner join, discover which widgets don't have principle names by projection and set subtraction (a one column projection of the widget table has a one column projection of the widget_name table subtracted from it - in SQL subtraction is done either by selection and projection from a one-sided outer join or by using "not exists") and we can discover the widgets which have an unknown principal name by restriction of the widget name table followed by projection. I'd like to see you explain (a) how you would do this without any sort of NULL and (b) in what sense this two column table is not fully normalised.

    And don't suggest adding a "has_principal_name" attribute to some other table - yes it works, but it is inefficient and introduces exactly the kind of data redundancy and overhead on updates that normalisation is supposed to eliminate; it might be a good trade-off if more than 50% of widgets had unknown principal names, and in that case I would call it a useful denormalisation. Remember that in a fully normalised schema some tables represent our knowledge about relations (that's while we call it a "relational" system) while others represent our knowledge about the entities which populate the domains of the relations, and that deciding to move information about relations into tables representing entities is most certainly denormalisation.

    Tom