• bitbucket-25253 (11/12/2011)


    Without prior knowledge of the QOD author I surmised it was you...

    Now a general question for yourself .... in your experiences, around the world, what would you say are the percentage of DBs in 4NF and / or higher (5NF,6NF).

    Well, I would guess about 50%, or maybe a bit lower, are in 4NF or higher.

    actual counts from experience:

    In 2007: system with 9 databases, 7 in 4NF or higher, 1 in 2NF (denormalised Management Information database), 1 not even in 1NF (arrogant and incompetent people who wouldn't follow instructions).

    In 2002: system with 3 databases, 1 in BCNF and 2 not even in 1NF.

    in 2000: system with 2 DBs, both in 4NF or higher.

    Before 2000: most everything 4NF or higher, except MI stuff.

    Many data warehouse/business intelligence/management information databases are not in 4NF, because they've been deliberately denormalised (which is reasonably safe because the usual concerns about update and insertion anomalies don't apply, the method of getting data into them is very different from other databases) to reduce join complexity of big queries in exchange for an increase of storage requirement through the increased redundancy of the data. Sometimes some parts of such databases aren't even in 3NF, some not even in 2NF.

    At the other end, some databases have never had any normalisation done at all. They are not in 2NF, because nobody involved in the design knew anything about normalisation and thought they just had to code around all the anomalies, often not in 1NF because someone thought nonatomic fields were a good idea and primary keys an uneeded luxury. Unless these systems are very simple and don't get caught up in enhancements to provide extra features they can be a real pain.

    It's fairly common for people to aim for 3NF (whether by guess or by using Bernstein's algorithm) and that usually ends up conforming to EKNF or BCNF (if they use Bernstein's algorithm it's guaranteed to be at least EKNF). Since any 3NF table which doesn't have overlapping candidate keys is guaranteed to be in BCNF, and overlapping candidate keys are rare, it's usually BCNF. Multivalued dependencies are also pretty rare, so probably most things that have been aimed at 3NF actually hit 4NF. I don't have any real feel for how rare join dependencies are, so I don't have a feel for how often 5NF is hit by accident, but I suspect it's quite often.

    It's less common for people to aim for 4NF or for 5NF than for 3NF or EKNF, but that probably doesn't cause the proportion of DBs in 4NF or higher to be much lower, simply because most 3NF schemas are 4NF as noted above.

    Of course some people will tell you that almost all SQL databases are not in any normal form because they have at least one column in at least one table that allows null. If you believe that pernicious claptrap, use a figure of about 1% instead of the about 50% I suggested above.

    Tom