RE: Denormalization Strategies

SSC Guru

Points: 104773

March 22, 2010 at 6:36 am

Gilles Willard (3/15/2010)
Maybe this is due to my lack of experience, or the fact that it's been several years since I last studied database normalization, but:
It seems to me that what Timothy is proposing is simply adding a few FKs to his database design to cut down on the JOINs required to get at some data.
How is that denormalizing?
I went back to my database design textbook (as well as the paper I wrote then on denormalization...)and could not figure out which form (1NF, 2NF or 3NF) was being violated?
Also, and please forgive my fussiness with this but I have teachers in my family going back three generations, Timothy's article, while being interesting, could have used a quick run through spell-checking. This is not just a "purity" thing: it is a little jarring to find typos or spelling mistakes in a technical article while you're concentrating on understanding the author's point, something like stumbling over a rock while on a good walk.

I agree about the rocks!

Maybe your problem with the denormalisation is the way he describes it: he talks about "adding Foreign Keys" when what he actually means is "adding extra columns to support extra Foreign Key relationships" (and he has a barbarian naming convention for the columns which support foreign keys, so that they sound as if they were foreign keys, not columns). When he adds the column called "FK_Medication" to his Pet_MedicationSeries table that introduces a new functional dependency within that table: the value of the column FK_Medication is determined (according to the schema design) completely by the value of the column FK_PetMedication; but FK_PetMedication is not a key, so the table is now not in 3NF.

Of course the proposed denormalisations are probably not the right way to solve the problem described - the first issue appears to arise largely because the date is not recorded in the pet_medication table (one consequence of this is that for medication that is not part of a series there is no means of discovering the date, so neither the query before the denormalisation nor the query after it can deliver the answer it is claimed to deliver). The solution is to put the date into the Pet_Medication table, and remove it from the Pet_MedicationSeries table. The first query now doesn't need to use the Pet_MedicationSeries table at all (and is now able to deliver what it was claimed to deliver) so there's no need to denormalise that table to help this query, and if it performs badly a clustered index on (date, id) [actually just on (date) would work too, but why not make sure it's unique and use it as the primary key?] will probably work wonders.

Denormalising to fix a consequence of a schema design that puts attributes in the wrong tables is not a sensible use of denormalisation - it's just a deeper burying of the underlying design problem.

I think Paul's 3 stars were over-generous.

(edited - a spelling error)

Tom