SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

Third Normal Form

By Tom Thomson,

Third Normal Form (3NF) is the last of the three usually remembered normal forms defined back in 1971. Like second normal form (2NF), its purpose is to ensure that the database schema definition prevents certain errors from occurring.

The idea of third normal form is that the value of a non-key attribute should not be deducible from the values of other non-key attributes. A non-key attribute is an attribute that isn't in any candidate key. A candidate key is a set of attributes that could be used as the primary key.

Like 2NF, 3NF is about real world properties of the objects being modelled by data in the database. It is an example of a class of real world business rules being encoded in the schema definition so that the database can enforce them directly.

Definition of 3NF

The easiest definition to understand is

A relation R is in 3NF if and only if (i) it is in 2NF and (ii) whenever X is a set of attributes of R and A is a non-key attribute of R that is not in X, the value of A can be determined from the values of X only if X includes a candidate key.

Actually, the first clause of this could refer to 1NF instead of 2NF, because given 1NF it's clear that (ii) implies (i); but it's traditional to say for each normal form that it requires the next lower normal form, to avoid messy proofs.

That was not the original definition, although it defines exactly the same property. The original definition said that the relation must be in 2NF and must not have any non-trivial transitive functional dependencies. Using that terminology would require too much definition for an introductory note like this.

Why go to 3NF?

To see what sort of problems can be prevented by using 3NF, we need to define some business rules and show how they can be violated in a table that doesn't conform to 3NF but not when the schema is changed so that all tables conform to 3NF.

We'll use the same Pest Control Product company as in the 2NF article, but look at part of the sales operation rather than development this time. The Sales Division operates several subdivisions, determined by the type of pest that a product is used for. Three of these are the Insects group, the Weeds group, and the Rodents group. The groups don't share storage depots because some years back some insecticide accidentally got contaminated by weed killer and the resulting compensation costs were rather large when the roses died along with the greenfly. The company has decided that only one group will have a storage depot in any given city, and that each group will have a single telephone ordering system with a single toll-free number (which doesn't call a fixed location: it calls the depot belonging to the group that is nearest to the calling phone's area code). The CFO's Personal Assistant has designed thrown together a set of tables to support this, part of which is the Sales_Depot table, which at some point in time holds the following data (amongst others: neither all columns nor all rows are shown).

Sales_Depot Table
City Group Address Order_phone
Toytown Insects 40 Tree Rd 800765
Bigtown Rodents 3 Main St 800210
Oldtown Weeds The Orchard 800284
Newtown Insects 1 New Way 800765
Anytown Rodents 4 Old Rd 800210
Erehwon Weeds 2 South St 800284


That all looks OK, until one day the phone system is upgraded to use 7 digit numbers instead of 6 digits, and everyone has to change. The company headquarters "carefully" checks that all the new numbers work before updating the database and telling the depots to advise their customers of the new numbers. But "carefully" was't carefully enough. When someone is calling out the new numbers from a list for someone else to type them in, the Rodents group number for Bigtown gets misheard, with a "01" instead of a "10" somewhere in it. The testing is done only using the data for the first city (in alphabetical order) in each group, so for the Rodents group Anytown is tested but Bigtown isn't and the error isn't spotted. Here is how that part of the table ends up, with the incorrect data highlighted in yellow:

Sales_depot Table (new phone numbers)
City Group Address Order_Phone
Toytown Insects 40 Tree Rd 8007650
Bigtown Rodents 3 Main St 8002010
Oldtown Weeds The Orchard 8002840
Newtown Insects 1 New Way 8007650
Anytown Rodents 4 Old Rd 8002100
Erehwon Weeds 2 South St 8002840

Therefore, 8002010 is the number sent out to customers of the Rodents group in Bigtown in a notice advising that the new number is available, and that some time after that, the old number will stop working. Of course all the customers carry on using the old number, until it stops working; and then they have a new number that doesn't work as expected (it is the number of a pet shop specialising in white mice). This is far from good for sales, for customer relations, or for the company's reputation as a careful and competent supplier.

The business rule states that a group has only one order phone, so the order phone number can be deduced from the Group attribute of the row, but here we have two rows with the same group but different order phones - the business rule has been broken. This can only happen because the order phone depends directly on the group according to the business rule, but the group attribute is not part of any candidate key so that means that the table is not in 3rd normal form. The two rows for the Rodents group have different phone numbers, which violates the business rule but does not violate any key constraints. The same order phone information has to be recorded more than once, in however many rows refer to the group that uses that phone. This failure to achieve 3NF means that the datum can end up being different in two places, despite the business rule saying it must not be. This is unwanted and dangerous redundancy, and permits inconsistencies that can have undesirable consequences.

The 3NF Version

To get this part of the schema into 3NF, it is necessary to move the data that is dependent on something other than the key into a separate table where what it is dependent on is the key. Removing that column from the Sales-Depot table leaves this (before the phone numbers update)

Sales_Depot Table
City Group Address
Toytown Insects 40 Tree Rd
Bigtown Rodents 3 Main St
Oldtown Weeds The Orchard
Newtown Insects 1 New Way
Anytown Rodents 4 Old Rd
Erehwon Weeds 2 South St

A new table, the Sales-Group table is needed to hold the Order_phone information; this looks like this

Before phone system upgrade
Group Order_phone
Insects 800765
Rodents 800210
Weeds 800284


After phone system upgrade
Group Order_phone
Insects 8007650
Rodents 8002100
Weeds 8002840

Now when the phone number update takes place, the order phone number for the Rodents sales group is only in one place in the database, so inconsistencies like the one described above can no longer occur. It is either right or wrong, and because it is in only one place it can't be right in one place but wrong in another.

Of course, errors can be avoided by insisting that things are updated only using stored procedures that update this column for every row containing the affected group but that means winning the fight to ban ad-hoc updates, which is often unwinnable. Perhaps they can be avoided by having a trigger that updates all the remaining rows that contain this number whenever one of them is updated, but that requires you to make sure RECURSIVE TRIGGERS is OFF, which may not be acceptable. Of course these approaches only work when there not too many rows. If a large number of rows are affected, there may be a performance issue as well. Besides, it is usually a bad idea to write extra code to fix problems that wouldn't exist if you chose the right table structures.

Another thing that isn't made clear by a small scale example like this one is that going to 3NF often makes a significant reduction in the size of the data, because it reduces data redundancy. With real world data, enforcing 3NF can sometimes deliver extra performance because the IO volume is vastly reduced and, when store is in short supply, it gives even more extra performance because working sets are reduced.

Total article views: 7393 | Views in the last 30 days: 1
Related Articles

Format Phone Number function

Function to format phone numbers entered as text.


Phone number formatting

Phone number formatting


Clean international Phone Numbers

Gotta clean up phone numbers as I load into SQL and it is a mess!


Random Phone numbers

Can anyone help me generate a statement that will update a table with random phone numbers with the ...


Ordering Multivalued/compound Attributes

Have I got the best way to order undefined numbers of attributes in a table?