Discussion on Normalization and Database Design Concepts

Question

Discussion on Normalization and Database Design Concepts

Sucharita Das

SSC-Addicted

Points: 456
More actions
February 23, 2021 at 11:41 pm

#3839433

Comments posted to this topic are about the item Discussion on Normalization and Database Design Concepts

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

simonmedula SSC Journeyman Points: 96 More actions · Answer 1

Well explained, good article.

Only thing I dont like in here and general naming columns: Sname instead of StudentName, adding full names is easier to read and understand.

This reply was modified 3 years, 3 months ago by simonmedula.

Brymen SSC-Addicted Points: 475 More actions · Answer 2

If table Teaching2 is defined as

CREATE TABLE dbo.Teaching2
(
StudentId INT,
InstructorId INT
)

Then how can the (StudentId,CourseId) combination be the candidate key?

JoeyD Grasshopper Points: 15 More actions · Answer 3

Agreed. And Teaching1 and Teaching2. Give the tables names that describe their purpose.

ScottPletcher SSC Guru Points: 100952 More actions · Answer 4

I strongly object to starting with table structures. Data normalization is a logical process, tables don't exist yet. It's all about the data itself. One should be referencing entities and attributes at that stage, only later, at the physical modeling stage, do they becoming tables and columns (although not necessarily on a 1 to 1 basis). This is not semantics, it's a vital distinction.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

hmbacon SSC Journeyman Points: 79 More actions · Answer 5

Great article, Sucharita. The explanation of these 4 normal forms is clear and concise. As Scott said, normal forms are a conceptual construct but I believe this analysis is useful in the case where a database has been inherited and there is no conceptual model. I don't think I have seen a database that was not at least in 3rd normal form if good conceptual modeling was performed. Identifying redundant data and update anomalies are good ways to determine if the existing database is sub-optimal. If there is much data redundancy and the team is spending a lot of time with data fixes, the database needs to be redesigned to be more optimal.

But I think normal form analysis on the relational model is more difficult as opposed to going back to the conceptual model. I say this because moving to higher normal forms is not always about splitting tables. If it was always about splitting tables it might work but often the attributes of an entity have been split between multiple tables and attributes from multiple entities have been combined in other tables. To start the conceptual model, you need to write a narrative of what the system needs to do. For instance, "The system needs to track employees in the university, keeping up with their personal information as well as the department they are in. It also needs to track basic information about the departments." Conceptually, I have two entities, employees and departments and these would probably be implemented as two tables so the problem is averted. Now you can look in the existing database and find every column that contains employee or department information and consider pulling all of that into the attributes for each of these entities. Be sure to document where the data comes from because you will needed when you migrate the old data into the new database.

I think there is a common blurring of conceptual and relational modeling. Conceptual modeling is identifying entities and relationships. Relational modeling is taking the entities, attributes and relationships and implementing the tables, columns and constraints for a specific RDBMS. In the relational model surrogate keys should be added and made the primary key constraint and the super key from the conceptual model needs to be implemented as unique key constraint.

In addition to the blurring of applying normal forms (conceptual concept) to the relational model, you make the following statement about super keys.

It specifies a uniqueness constraint that no two records in a table can be identical. One or more columns in the table can be defined as super key if the columns are able to uniquely identify the records.

By your definition, I can add columns with a sequential number and every record is different, even if all the other columns are identical. Studentid is a surrogate key which is not a conceptual object. Studentid can uniquely identify a record but it cannot uniquely identify a student. Surrogate keys are needed but they are not a true super key. The super key in the student table would be the columns that uniquely identify a student. There is nothing to stop me from entering the exact same student multiple times if each is given a new Studentid.

In closing, I would like to present an analogy. When building a house, the relational model is the set of plans to describe all the details of the house. The conceptual model would be the conversations with the home owner to determine what the want in their house including an understanding of what they mean by general concepts such as a living room versus a family room, relative sizes of the rooms and maximum size of the house.