Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Second Normal Form

By Tom Thomson,

Second normal form (2NF) requires first normal form and one extra condition: if X is a set of prime attributes and A is a non-prime attribute and the value of A is determined by the values of X then X contains the whole of at least one candidate key. An attribute is prime if it is part of any candidate key, and non-prime otherwise. A Candidate Key is a set of columns that could reasonably be used as the Primary Key. The Primary Key is a set of columns which don't permit NULLs and which is guaranteed not to have the same set of values in two different rows, and for which no subset has these properties. This ensures that certain kinds of incorrectness can't creep into the database through erroneous inserts or updates though it cannot, of course, prevent all erroneous inserts and updates.

The extra condition for 2NF is sometimes informally expressed as saying that the value of every non-key column depends on the whole of the key. Saying that each column depends on the whole key means that the real business entities which the database models are such that the non-key attributes depend on all the key attributes, not on just some of them. This is an example of a real-world business rule being modelled in the structure of the database in order to ensure that certain errors can't occur, which is what all of 2nd, 3rd, 4th and 5th normal forms are intended to do.

A table not in 2NF

To see the kind of errors that are prevented by the use of 2NF we have to look at what can go wrong when a table is not in 2NF, which means we have to look at what kind of business rule can be violated if such a table is used. We will state a business rule and show how an update can cause a table not in 2NF to violate this rule, and then show how to change the table structure so that the rule cannot be violated.

We will use a Current_Assignment table, a table which shows for each employee working on a project what proportion of his time he spends on that project. This allows that time to be charged to the project. As charges are made in money terms, we need to know what the employee costs (salary and overhead). In this company salary is determined by the Pay Grade of the employee and his time in that grade (measured in years), and overheads are charged at 100% of salary. It is of course an absolute rule of the business that an employee can have, at any given time, only one Pay Grade and only one time in grade, and that every project that uses an employee in a given pay period sees the same charging rate for that employee.

The DBA at this company (actually the CFO's Personal Assistant, whose qualification for the DBA role is that once, a quarter of a century ago, she wrote a dbase II query) decides to put all the data needed into the Current_Assignment table, as this seems the simplest solution. She scripts the table like this:

CREATE TABLE Current_Assignment (
EmployeeID int NOT Null,
ProjectID int NOT Null,
TimeProportion NOT Null, -- percent of employee's time allocated to project
PayGrade int NOT Null,
YearsInGrade int NOT Null,
CONSTRAINT PK_Current_Assignment Primary Key Clustered
(ProjectID, EmployeeID)
)

Since an employee can be working on several projects, and each project can be using several employees, (EmployeeID ,ProjectID) is the only candidate key; so it has been assigned as the primary key.

Here the PayGrade and TimeInGrade columns are dependent on only part of the key (the EmployeeID); they are not dependent on all of it, because they have to be the same for a given employee whichever project he works on: so the table is not in 2NF. Let's see how a problem can arise as a result.

At some point in time, part of the contents of the table are as follows:

Current_Assignment Table
EmployeeID ProjectID TimeProportion PayGrade YearsInGrade
Bill Hacker New MouseTrap 50 15 3
Buck Bossman New MouseTrap 40 21 1
Penny Drudge New MouseTrap 100 12 2
Bill Hacker Better Flypaper 50 15 3
Buck Bossman Better Flypaper 50 21 1
David Drudge Better Flypaper 100 13 1
Charlie Fixit Better Flypaper 10 12 3
Buck Bossman Roach Eliminator 10 21 1
Charlie Fixit Roach Eliminator 90 12 3
Joe Bottom Roach Eliminator 100 9 12

Employee Buck Bossman (the group manager in charge of the projects and people shown) decides to give employee Charlie Fixit a pay raise, and updates the table so that the row with primary key (Roach Eliminator,Charlie Fixit) has PayGrade 13 and YearsInGrade 0. This happens because Buck associates Charlie with the Roach Eliminator Project, where he spends 90% of his time, and forgets that he also does some work on the Better Flypaper project. So now Charlie has different Pay Grade and Years in Grade in different rows of the table.

When a query is made of the updated table to discover Charlie Fixit's PayGrade and YearsInGrade, there are two rows with two different values, and it is a toss-up which pair of values will be delivered as the result. That's rather unfortunate if this is to be used to look up an actual salary in a table mapping grade and years in grade to money (Charlie will not be pleased if he gets paid at his old, lower, rate this month). When Charlie's time is charged to projects, Better Flypaper and Roach Eliminator will see different charge rates for the same person. The update has introduced a serious error into the database, violating a simple business rule.

Fixing the problem

If the schema is modified so that the table is in 2NF, this error can no longer occur. To effect this, the information which is dependent on only part of the key must be split out into a separate table, where that part key is the whole key. The new table definition scripts are

CREATE TABLE PayGrade_Detail (
EmployeeID int NOT Null Primary Key Clustered,
PayGrade int NOT Null,
TimeInGrade int NOT Null,
)
CREATE TABLE Current_Assignment (
EmployeeID int NOT Null REFERENCES PayGrade_Details (EmployeeID),
ProjectID int NOT Null,
TimeProportion NOT Null, -- percent of employee's time allocated to project
CONSTRAINT PK_Current_Assignment Primary Key Clustered
(ProjectID, EmployeeID)
)

The content corresponding to that shown above will now be

Current_Assignment
EmployeeID ProjectID TimeProportion
Bill Hacker New MouseTrap 50
Buck Bossman New MouseTrap 40
Penny Drudge New MouseTrap 100
Bill Hacker Better Flypaper 50
Buck Bossman Better Flypaper 50
David Drudge Better Flypaper 100
Charlie Fixit Better Flypaper 20
Buck Bossman Roach Eliminator 10
Charlie Fixit Roach Eliminator 80
Joe Bottom Roach Eliminator 100


PayGrade_Details Table
EmployeeID PayGrade YearsInGrade
Bill Hacker 15 3
Buck Bossman 21 1
Penny Drudge 12 2
David Drudge 13 1
Charlie Fixit 12 3
Joe Bottom 9 12


PayGrade and YearsInGrade are now in a table where the only candidate key is (EmployeeID); and both are dependent on that key, not just on some part of it (in this case that's obvious: the key doesn't have multiple parts). So with this pair of tables instead of the original single table, all tables are in 2NF.

As a result, the update to change the PayGrade for employee Charlie now affects only one row, so it can't introduce contradictory information; the information is in one place only, so it can't have different values in different places. Changing to 2NF has eliminated some error-prone redundancy, and it is no longer possible to make the data fail to conform to the business rule that could be broken before.

Of course in this example there may well be an employee table into which the PayGrade information can be moved, instead of creating a new table; and people often think that 2NF is a lot of fuss about nothing when given examples like this, often because "no experienced DBA would make an error like that" and just as often because "you can fix it trivially by writing a single update statement which updates all the rows that should be affected"; the first is just plain wrong: people do make this mistake, and the second is not so easy if the app is to display a view on a screen and update is by overtype. If the view is a not-2NF relational view built by joining the two 2NF tables, an update through the view does indeed solve the problem, but that requires the base tables to be in 2NF.

Multiple candidate keys

The example above is simple partly because there is only one Candidate Key in the table. When checking for violations of 2NF it is essential to look at all Candidate Keys, not just the Primary Key, because the same error-prone redundancy can arise there too. If there is a UNIQUE constraint or index on a set of columns that don't permit NULL values that indicates that there is a Candidate Key other than the Primary Key. Of course the DBA may forget to specify a UNIQUE constraint when some column set is actually unique (and free of NULLs) according to the business rules (this is bad schema design - such constraints should always be specified to let the database prevent what errors it can, and having appropriate constraints can ensure that database operations can't cause this column set to contain duplicates) but that column set is still a Candidate Key because the business rules say it is, even though the constraint wasn't specified in the schema.

Total article views: 7636 | Views in the last 30 days: 9
 
Related Articles
ARTICLE

TSQL Challenge 72 - Calculate the Payroll Hours of Employees

This challenge invites you to solve a payroll challenge which requires to calculate the number of ho...

BLOG

Salaried employee vs contractor

Are you a salaried employee looking to switch to a W-2 or 1099 contractor? (For an explanation of W-...

FORUM

sql query need Help about Absent Employee

Query about absent employee

FORUM

how to update analyses services 2008 project when a database field was changed

how to update analysis services 2008 project when a database field was changed

ARTICLE

Employee Slurping

This editorial was originally published on Oct 21, 2006. It is being re-run as Steve is out at SQL i...

Tags
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones