Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Stairway to Database Design Level 9: Normalization Expand / Collapse
Author
Message
Posted Tuesday, August 09, 2011 3:31 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945, Visits: 2,782
Comments posted to this topic are about the item Stairway to Database Design Level 9: Normalization

Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1156633
Posted Tuesday, September 20, 2011 8:32 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Monday, April 14, 2014 10:01 AM
Points: 2,980, Visits: 762
Excellent!!!

Thanks Joe,
Thomas


Thomas LeBlanc, MCITP DBA 2005, 2008 & MCDBA 2000
http://thesmilingdba.blogspot.com/
Post #1178409
Posted Wednesday, September 21, 2011 8:34 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, February 06, 2014 12:59 PM
Points: 801, Visits: 1,962
Great work Joe. My question is "why is this so late in the design process?" It is so hard (expensive and frustrating) to have to refactor something that is in production.

When we decided that we were going to take everything we had learned and build a robust product suite it became obvious that the database design had to be the foundation. During the design meetings we did normalization from the start. Now when we have to add features and capabilities it is not such a big deal. We face an additional set of challenges. Mobile workers. Each carries a subseted replica of the database. Looking at the deployed fleet statistics I was plesently surprised at how small of a footprint it actually takes. Normalization is at the heart of that. It might not have happened had we not done it right up front.


ATB

Charles Kincaid

Post #1178682
Posted Friday, November 11, 2011 2:27 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, March 12, 2014 10:02 AM
Points: 1,151, Visits: 1,039
The ignorant will declare an IDENTITY as the key and all other columns are NULL-able. They do not know that since the table's IDENTITY property is not a column nor an attribute of any possible data model, it cannot ever be used as a key. It is the count of the physical insertion attempts. It is a way for a noob to mimic the record numbers he knows from magnetic tapes and other sequential files.


All very well in theory, and in the example Joe quotes, there's a way to change the design to avoid the need to have a table where the non-null requirements of the key fields get in the way of being able to insert data.

But of course, it isn't true that an IDENTITY "cannot ever be used as a key". Joe's personal view seems to be that an IDENTITY should never be used in this way, but putting the statement in that way, in an otherwise educational article, may confuse "the ignorant".

For me, a key is anything that lets me uniquely identify a record in a table. If that can be done with a unique social security number or equipment serial number or email address or some other combination of fields, that's fine. But I've often come across situations where there is no combination of data that is *always* unique and *always* available to the people creating the records, at the time they are creating them, that can be used as a key. In that case, either the users need to make up a unique key, or the database does. The database is better at it, and I don't really much care whether it is the count of the physical insertion attempts or a GUID or anything else.
Post #1203980
Posted Friday, November 11, 2011 5:06 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, March 26, 2014 8:16 AM
Points: 161, Visits: 142
I'm sitting here thinking about this article, and trying to decide how much I liked it.
I think that if I hadn't had any previous knowledge on normalization and database design, I would have found this article a little too academic and difficult to follow.
I would have found the layout a little confusing. From my point of view, the article would have been even better if it had contained some bullet lists with the rules of the Normal forms, as an example.

On the other hand, this is an excellent article someone with a little technical background on the subject. The black holes are filled with in depth knowledge, from someone who knows what he's doing.
Post #1204042
Posted Friday, November 11, 2011 8:14 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: 2 days ago @ 11:13 AM
Points: 257, Visits: 901
I was momentarily lost on the abbreviations FD and MVD until I rewound to their usage. I suggest a simple edit to enclose the abbreviations in parenthesis immediately following the full words.

"aether" i think should be "either" - though 'marriage...does not belong to aether' either.

"Following this rule...is the basis for the other neuritics" <-- I think this was supposed to be "heuristics" but it would be amusing if you meant some form of nerve pain

I have a mortgage, but this Shelton did not look familiar. Google couldn't make sense of it for me either.

"This is why EAV does not work" - we all know EAV is entity-attribute-value, but I a link to another document explaining it might be helpful. Same with CSV list; most readers know comma-separated values - but in this context might think its something else. (If SQL ServerCentral has a rule against such links, I question the value of such a restriction but C'est La Vie)

The Create Table examples are sufficient for those who are already trained to "think in abstractions." I would like to have seen a grid-view of actual data next to those DDL statements. I think many of us also natively think in pictures.


If a discussion on normal form(s) is going to start with relational algebra [imo: it should] then I agree with the intent to dissuade thinking of primary key as the auto-increment number column thoughtlessly added as the first field of any table. After the concept of a primary key is formally established, it might be worth noting that many schema architects will use the identity column as a crutch because it's so easy to implement. Of course that becomes an article in itself.




Post #1204176
Posted Friday, November 11, 2011 6:48 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945, Visits: 2,782
All very well in theory, and in the example Joe quotes, there's a way to change the design to avoid the need to have a table where the non-null requirements of the key fields [sic: columns are not fields] get in the way of being able to insert data.

>> But of course, it isn't true that an IDENTITY "cannot ever be used as a key". Joe's personal view seems to be that an IDENTITY should never be used in this way, but putting the statement in that way, in an otherwise educational article, may confuse "the ignorant". <<

Yes, you can use anything you wish as a PRIMARY KEY in SQL All it needs is uniqueness within just the local table. Just like a record number in a magnetic tape file. Now, repeat the phrase “in SQL”; a key (notice the lower case word without PRIMARY in front of it) is a logical concept required by a correct data model. Heck, a table in SQL does not have to have any key at all!

This is not me; this is pure Dr. Codd and 40 years of the Relational Model. The key is a subset of the attributes.

>> For me, a key is anything that lets me uniquely identify a record [sic: rows are not records] in a table. If that can be done with a unique social security number or equipment serial number or email address or some other combination of fields [sic], that's fine. <<

Fields? Records? Those are terms from file systems, not RDBMS. Fields are defined by the program reading each record. They are ordered, so “READ a, b, c” is not the same as “READ c, b, a” in procedural file system languages. I did an article on the huge differences.

My contribution to keys was adding the requirements that a key have Validation and Verification. Validation means you can look at the data element and tell that it is valid. For example, I can use a regular expression to see if a VIN has the “right shape” in and of itself.. Verification means I havea process to determine if that VIN actually belongs to a real automobile.

>> But I've often come across situations where there is no combination of data that is *always* unique and *always* available to the people creating the records [sic], at the time they are creating them, that can be used as a key. <<

Give me an example. I have never had that problem and I have been at this for a few decades. I have found that 90%+ of the time there is an industry standard identifier,

>> In that case, either the users need to make up a unique key, or the database does. The database is better at it, and I don't really much care whether it is the count of the physical insertion attempts or a GUID or anything else. <<

Unlike you magnetic tape programmers (yes, that is what you are doing in SQL; think about it) I really, really care about keys. I want my validation and verification so that my data does not depend on which piece of hardware I was using at the time. I have had to create encoding schemes to use as keys. For example, an art gallery had an inventory of unique items so we could not use a UPC code. We came up with a code based on the material used and the artist and a sequence.

Your GUIDs, autonumber , ROW_ID and other things created on the local hardware are called locators or physical locators. They are not keys.



Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1204536
Posted Monday, November 14, 2011 2:53 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, March 12, 2014 10:02 AM
Points: 1,151, Visits: 1,039
Rows, columns, fields, records: Microsoft themselves use the terms pretty interchangeably - fire up the View designer in MS Access, or try to delete something from a table, and see what terms are used ( "fields" and "records"); do the same in SQL Server and they are called columns and rows.


>> But I've often come across situations where there is no combination of data that is *always* unique and *always* available to the people creating the records [sic], at the time they are creating them, that can be used as a key. <<

Give me an example. I have never had that problem and I have been at this for a few decades. I have found that 90%+ of the time there is an industry standard identifier,



Here are three (there are plenty more) :

1: Your own example of identifying vehicles. Outside the car repair industry, VIN is not a useful identifier as most people don't know their VIN even if they know where to find it; licence plate may work in some circumstances but there can be complications caused by transferrable number plates.

2: What unique identifier do you use for people - let's say, members of a club ?

3: When you say you created an ID for artworks based on material, artist and a sequence number, were you not just creating a slightly fancier ( and harder to use) form of "count of physical insertion attempts" ?
Post #1204977
Posted Monday, November 14, 2011 3:23 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 8:37 AM
Points: 8,287, Visits: 8,738
I quite like most of this article, in fact most of it is excellent, but there are some parts that aren't up to the standard of the rest of it.

The section headed 5NF doesn't actually discuss 5NF at all, and what it does discuss has only the most tenuous connection with 5NF.

Suggesting that the move from 3NF to BCNF is done to enable constraints to be correctly represented is strange, since BCNF is a normal form in which it is not possible to represent all simple constraints. Any normalisation beyond EKNF (that's the form generated by Bernstein's algorithm, it is intermediate between 3NF and BCNF) risks losing important FD-related constraints. The article should probably mention that, so that readers are warned to watch out for the problem and avoid going to BCNF for the cases where it is known to lead to failure to enforce required constraints, instead of going blindly to BCNF for every base relation in their database (although going blindly to BCNF would of course be better that not even getting to 1NF, which seems to be a pretty common approach).

Also, I think you are too harsh on surrogate keys. I agree that using them because one can not be bothered to find a natural key is insanity. I dislike the idea of using them just to save space, too - ideally the dbms should do any necessary space saving behind the scenes. But making it easier to write queries and and simplifying the ddl for expressing referential transparency constraints (by making it possible to write a single attribute key instead of a five attribute key, for example) are desirable ends, and it is perfectly sensible to use surrogate keys for those purposes provided of course that uniqueness is enforced both for the natural key and for the surrogate key in the table defining the surrogate key and that the remaining attributes of the entity or relationship of which the natural key is a key are held in that defining table, not somewhere else as if the surrogate were an entity in its own right.

Mother Celko's heuristics are great - everyone should have a copy firmly imprinted in their thinking about database design. Perhaps the penultimate one could be taken a bit too literally - it implies that nothing is normalised unless it is in Ron Fagin's DKNF, but it has been demostrated that some schema and constraint combinations have no DKNF representation. Also, I think you are not quite harsh enough on NULLs - allowing them to mean different things in different columns, when all they should ever be allowed to mean is "the value that would be here if we had it isn't here".


Tom
Post #1204987
Posted Thursday, November 17, 2011 10:05 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945, Visits: 2,782
Charles Kincaid (9/21/2011)
Great work Joe. My question is "why is this so late in the design process?" It is so hard (expensive and frustrating) to have to refactor something that is in production.

When we decided that we were going to take everything we had learned and build a robust product suite it became obvious that the database design had to be the foundation. During the design meetings we did normalization from the start. Now when we have to add features and capabilities it is not such a big deal. We face an additional set of challenges. Mobile workers. Each carries a subseted replica of the database. Looking at the deployed fleet statistics I was plesently surprised at how small of a footprint it actually takes. Normalization is at the heart of that. It might not have happened had we not done it right up front.


I agree that normalization has to be done from the start. IO jsut did it last because, in the words of the great philosopher Barbie, "Math is hard!"

The original purpose of databases -- not just Relational ones -- was to reduce redundancy. Centralizing data was one technique. Normalization is another one. Then we have non-Normal form redundancies to worry about .


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1207694
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse