SQLServerCentral Article

Implementation of Normalization in SQL Server

,

Introduction

In my last article, Discussion on Normalization and Database Design Concepts, I discussed about the normalization concepts. Here, I will take one table in SQL Server and implement different normal forms. As we move up the higher normal forms, we end up on more number of tables.

Implementation of Normal Forms

Let us start with a table, named StudentProject. I added few records in the table as per the defined structure. I will verify if the table satisfies different normal forms. I will modify the table structure or will split the data in multiple tables to fulfill the requirements for each normal form.

IF OBJECT_ID('StudentProject','U') IS NOT NULL
DROP TABLE dbo.StudentProject
CREATE TABLE dbo.StudentProject
(
StudentId INT,
ProjectId INT,
GuideId INT,
HrsWorked INT,
Sname VARCHAR(50), --Fname, Lname
Saddr VARCHAR(100), -- Saddr1, Saddr2
Pname VARCHAR(100),
DeptId INT, 
Dname VARCHAR(50)
)
INSERT INTO dbo.StudentProject 
(StudentId,ProjectId,GuideId,HrsWorked,Sname,Saddr,Pname,DeptId,Dname)
VALUES
(101,201,301,40,'xxx zzz','house1,house2','project1',401,'dept1'),
(102,201,301,40,'aaa ccc','house3','project1',401,'dept1'),
(103,204,304,40,'ddd','house4,house5','project4',402,'dept2'),
(101,203,303,40,'xxx zzz','house1,house2','project3',401,'dept1')
SELECT StudentId,ProjectId,GuideId,HrsWorked,Sname,Saddr,Pname,DeptId,Dname
FROM dbo.StudentProject

First Normal Form (1NF)

The domain of an attribute must include only atomic (simple, indivisible) values and value of any attribute in a tuple must be a single value from the domain of that attribute.

There is a column, named Sname, in StudentProject that is for storing the student name. I want to save last name and/or first name of the student. I entered the last name and first name values with a space separator. To maintain last name and first name separately, I should create two different columns.

The saddress column contains the student address. A student may have multiple addresses. Here, I entered two addresses for a student with a comma separator. To maintain two addresses, two different columns should be created.

I create a new table named StudentProject1NF where I keep two different columns for first name and last name of the student. Also, there are two different columns for address 1 and address 2. That way, each column is having single indivisible value and the table StudentProject1NF is in first normal form.

I will transfer data from StudentProject to StudentProject1NF and will drop the StudentProject table later.

--1NF
--StudentProject table structure is modified to create StudentProject1NF
IF OBJECT_ID('StudentProject1NF','U') IS NOT NULL
DROP TABLE dbo.StudentProject1NF
CREATE TABLE dbo.StudentProject1NF
(
StudentId INT,
ProjectId INT,
GuideId INT,
HrsWorked INT,
Fname VARCHAR(50), 
Lname VARCHAR(50), 
Saddr1 VARCHAR(100), 
Saddr2 VARCHAR(100), 
Pname VARCHAR(100),
DeptId INT, 
Dname VARCHAR(50),
)
--data transfer
INSERT INTO dbo.StudentProject1NF 
(StudentId,ProjectId,GuideId,HrsWorked,Lname,Fname,Saddr1,Saddr2,Pname,DeptId,Dname)
SELECT 
StudentId,
ProjectId,
GuideId,
HrsWorked,
PARSENAME(REPLACE (Sname,' ','.'),1) AS Lname,
PARSENAME(REPLACE (Sname,' ','.'),2) AS Fname,
PARSENAME(REPLACE (Saddr,',','.'),1) AS Saddr1,
PARSENAME(REPLACE (Saddr,',','.'),2) AS Saddr2,
Pname,
DeptId,
Dname
FROM dbo.StudentProject
SELECT StudentId,ProjectId,GuideId,HrsWorked,Lname,Fname,Saddr1,Saddr2,Pname,DeptId,Dname
FROM dbo.StudentProject1NF 
DROP TABLE dbo.StudentProject

Second Normal Form (2NF)

A relation schema or table is in second normal form (2NF) if every non-prime attribute is not partially dependent on any key of the table.

Let us start with StudentProject1NF table. For this table, (StudentId, ProjectId) is the candidate key of the table. Now, we will check on the functional dependencies existing on the table.

  • FD1: StudentId, ProjectId --> HrsWorked
  • FD2: StudentId --> Lname, Fname, Saddr1, Saddr2
  • FD3: ProjectId --> Pname
  • FD4: StudentId, ProjectId --> DeptId
  • FD5: DeptId --> Dname
  • FD6: GuideId --> ProjectId

FD2 and FD3 violate the 2NF condition. In FD2, Lname, Fname, Saddr1, and Saddr2 are dependent on StudentId, i.e. these columns are partially dependent on the candidate key (StudentId, ProjectId) of the table. Also, in FD3, Pname is dependent on ProjectId, i.e. Pname is partially dependent on the candidate key.

To achieve 2NF, I will create three new tables: StudentProject2NF, Student2NF, and Project2NF. I will transfer data from StudentProject1NF to these three new tables appropriately and later will drop the StudentProject1NF table.

--2NF
--StudentProject1NF table is divided into three tables -- StudentProject2NF, Student2NF, Project2NF
--table 1
IF OBJECT_ID('StudentProject2NF','U') IS NOT NULL
DROP TABLE dbo.StudentProject2NF

CREATE TABLE dbo.StudentProject2NF
(
StudentId INT,
ProjectId INT,
GuideId INT,
HrsWorked INT,
DeptId INT, 
Dname VARCHAR(50),
)
--table 2
IF OBJECT_ID('Student2NF','U') IS NOT NULL
DROP TABLE dbo.Student2NF

CREATE TABLE dbo.Student2NF
(
StudentId INT,
Sname VARCHAR(50),
Fname VARCHAR(50), 
Lname VARCHAR(50), 
Saddr1 VARCHAR(100), 
Saddr2 VARCHAR(100), 
)
--table 3
IF OBJECT_ID('Project2NF','U') IS NOT NULL
DROP TABLE dbo.Project2NF
CREATE TABLE dbo.Project2NF
(
ProjectId INT,
Pname VARCHAR(100),
)
--data transfer
INSERT INTO dbo.StudentProject2NF (StudentId,ProjectId,GuideId,HrsWorked,DeptId,Dname)
SELECT StudentId,ProjectId,GuideId,HrsWorked,DeptId,Dname
FROM dbo.StudentProject1NF
INSERT INTO dbo.Student2NF (StudentId,Fname,Lname,Saddr1,Saddr2)
SELECT DISTINCT StudentId,Fname,Lname,Saddr1,Saddr2
FROM dbo.StudentProject1NF
INSERT INTO dbo.Project2NF (ProjectId, Pname)
SELECT DISTINCT ProjectId, Pname
FROM dbo.StudentProject1NF
SELECT StudentId,ProjectId,GuideId,HrsWorked,DeptId,Dname
FROM dbo.StudentProject2NF
SELECT StudentId,Fname,Lname,Saddr1,Saddr2
FROM dbo.Student2NF 
SELECT ProjectId, Pname
FROM dbo.Project2NF 
DROP TABLE dbo.StudentProject1NF

Let us check the functional dependencies in the three new tables.

StudentProject2NF

(StudentId, ProjectId) is the candidate key of the table. The functional dependencies are:

  • FD1: StudentId, ProjectId --> HrsWorked
  • FD2: StudentId, ProjectId --> DeptId
  • FD3: DeptId --> Dname
  • FD4: GuideId --> ProjectId

All these functional dependencies follow 2NF. So, StudentProject2NF table is in 2NF.

Student2NF

StudentId is the candidate key of the table. There is one functional dependency:

  • FD1: StudentId --> Lname, Fname, Saddr1, Saddr2

FD1 follows 2NF. So, the table Student2NF is in 2NF.

Project2NF

ProjectId is the candidate key of the table. Functional dependency:

  • FD1: ProjectId --> Pname

FD1 follows 2NF. So, the table Project2NF is in 2NF.

All the new three tables are in 2NF.

Third Normal Form (3NF)

A relation schema or table is in Third normal form (3NF), when a nontrivial functional dependency X --> A holds in the table, then either X is a super key of the table or A is a prime attribute of the table.

Let us consider the StudentProject2NF table first. (StudentId, ProjectId) is the candidate key. The functional dependencies are:

  • FD1: StudentId, ProjectId --> HrsWorked
  • FD2: StudentId, ProjectId --> DeptId
  • FD3: DeptId --> Dname
  • FD4: GuideId --> ProjectId

FD3 violates 3NF condition. Because, neither DeptId is the super key of the table nor Dname is any prime attribute.

I will create two new tables -- StudentProject3NF, Department3NF and will transfer data from StudentProject2NF to these new tables and will drop StudentProject2NF  later. Let us now consider Student2NF and Project2NF.

Student2NF

StudentId is the candidate key of the table. Functional dependency:

  • FD1: StudentId --> Lname, Fname, Saddr1, Saddr2

FD1 follows 3NF. So, the table Student2NF is in 3NF.

Project2NF

ProjectId is the candidate key of the table. Functional dependency:

  • FD1: ProjectId --> Pname

FD1 follows 3NF. So, the table Project2NF is in 3NF.

No change is required for Student2NF and Project2NF.

--3NF
-- StudentProject2NF table is further divided in two tables -- StudentProject3NF, Department3NF
IF OBJECT_ID('StudentProject3NF','U') IS NOT NULL
DROP TABLE dbo.StudentProject3NF

CREATE TABLE dbo.StudentProject3NF
(
StudentId INT,
ProjectId INT,
GuideId INT,
HrsWorked INT,
DeptId INT
)

IF OBJECT_ID('Department3NF','U') IS NOT NULL
DROP TABLE dbo.Department3NF

CREATE TABLE dbo.Department3NF
(
DeptId INT, 
Dname VARCHAR(50),
)
--data transfer
INSERT INTO dbo.StudentProject3NF (StudentId,ProjectId,GuideId,HrsWorked,DeptId)
SELECT StudentId,ProjectId,GuideId,HrsWorked,DeptId
FROM dbo.StudentProject2NF 
INSERT INTO dbo.Department3NF (DeptId,Dname)
SELECT DISTINCT DeptId,Dname
FROM dbo.StudentProject2NF 
SELECT StudentId,ProjectId,GuideId,HrsWorked,DeptId
FROM dbo.StudentProject3NF 
SELECT DeptId,Dname
FROM dbo.Department3NF 
DROP TABLE dbo.StudentProject2NF

Let us check the functional dependencies in the two new tables.

StudentProject3NF

(StudentId, ProjectId) is the candidate key. The functional dependencies are:

  • FD1: StudentId, ProjectId --> HrsWorked
  • FD2: StudentId, ProjectId --> DeptId
  • FD3: GuideId --> ProjectId

All the functional dependencies follow 3NF. So, StudentProject3NF table is in 3NF.

Department3NF

DeptId is the candidate key. The functional dependency is:

  • FD1: DeptId --> DName

FD1 follows 3NF. So, Department3NF table is in 3NF.

At this stage, we have four tables: StudentProject3NF , Department3NF , Student2NF and Project2NF. All tables are in 3NF.

Boyce Codd Normal Form (BCNF)

A relation schema or table is in Boyce Codd normal form (BCNF), when a nontrivial functional dependency X --> A holds in the table, then X is a super key of the table.

Let us consider the table StudentProject3NF first. (StudentId, ProjectId) is the candidate key. The functional dependencies are:

  • FD1: StudentId, ProjectId --> HrsWorked
  • FD2: StudentId, ProjectId --> DeptId
  • FD3: GuideId --> ProjectId

FD1 and FD2 satisfy BCNF. But, FD3 violates BCNF as GuideId is not a super key. To maintain BCNF, the table StudentProject3NF needs to be decomposed.

I will create two new tables -- StudentProjectBCNF, ProjectGuideBCNF. I will transfer data from StudentProject3NF  to the new tables and later will drop StudentProject3NF.

Let us consider the other three existing tables to check if they are in BCNF or not.

Department3NF

DeptId is the candidate key. The functional dependency is:

  • FD1: DeptId --> DName

FD1 follows BCNF. So, Department3NF table is in BCNF.

Student2NF

StudentId is the candidate key of the table. Functional dependency:

  • FD1: StudentId --> Lname, Fname, Saddr1, Saddr2

FD1 follows BCNF. So, the table Student2NF is in BCNF.

Project2NF

ProjectId is the candidate key of the table. Functional dependency:

  • FD1: ProjectId --> Pname

FD1 follows BCNF. So, the table Project2NF is in BCNF.

No change is required for Department3NF , Student2NF and Project2NF.

--BCNF
--StudentProject3NF table is further divided in two tables --  StudentProjectBCNF, ProjectGuideBCNF
IF OBJECT_ID('StudentProjectBCNF','U') IS NOT NULL
DROP TABLE dbo.StudentProjectBCNF

CREATE TABLE dbo.StudentProjectBCNF
(
StudentId INT,
ProjectId INT,
HrsWorked INT,
DeptId INT
)
IF OBJECT_ID('ProjectGuideBCNF','U') IS NOT NULL
DROP TABLE dbo.ProjectGuideBCNF

CREATE TABLE dbo.ProjectGuideBCNF
(
ProjectId INT,
GuideId INT
)
--data transfer
INSERT INTO dbo.StudentProjectBCNF (StudentId,ProjectId,HrsWorked,DeptId)
SELECT StudentId,ProjectId,HrsWorked,DeptId
FROM dbo.StudentProject3NF 
INSERT INTO dbo.ProjectGuideBCNF (ProjectId,GuideId)
SELECT DISTINCT ProjectId,GuideId
FROM dbo.StudentProject3NF 
SELECT StudentId,ProjectId,HrsWorked,DeptId
FROM dbo.StudentProjectBCNF 
SELECT ProjectId,GuideId
FROM dbo.ProjectGuideBCNF 
DROP TABLE dbo.StudentProject3NF

Let us check the functional dependencies in the two new tables.

StudentProjectBCNF

(StudentId, ProjectId) is the candidate key. The functional dependencies are:

  • FD1: StudentId, ProjectId --> HrsWorked
  • FD2: StudentId, ProjectId --> DeptId

All the functional dependencies follow BCNF. So, StudentProjectBCNF table is in BCNF.

ProjectGuideBCNF

GuideId is the candidate key. The functional dependency is:

  • FD1: GuideId --> ProjectId

FD1 follows BCNF. So, ProjectGuideBCNF table is in BCNF.

At this stage, we have five tables: StudentProjectBCNF , ProjectGuideBCNF, Department3NF , Student2NF and Project2NF. All tables are in BCNF.

Conclusion

In this article, we started with a single table and created five tables to fulfill the criteria for the normal forms up to BCNF. The step-by-step process is shown for checking if a table is in a particular normal form. If not, the table data is decomposed in multiple tables to fulfill the requirement of that normal form. For ease of understanding, I added suffix 1NF, 2NF, 3NF, BCNF with the table names.

Users may use any acceptable naming conventions for their tables. Also, the column names can be set as per the choice of the individual user. When we are creating the new tables, it is important to transfer the data from the existing tables correctly. There should not be any data loss or redundant data. At the end of data transfer from existing table to the new decomposed tables, the old table was dropped to clean the database.

Rate

3 (2)

Share

Share

Rate

3 (2)