RE: Design – SQLServerCentral

SSC Guru

Points: 64780

September 23, 2010 at 11:40 pm

I agree with UMG developer about the multiple identifier. I already had this option checked before deciding to check BOL first. I disagree with him about the missing identifier - the OrderRefNo does have both the NEWID() default and the PRIMARY KEY constraint to make this a good identifier.

But what bothers me more than losing a point, is that we are asked to answer according to BOL. When it comes to good database design, BOL is about the last reference I'd use! I must admit that the page about normalization that is referenced here is not as bad as I feared, but there are still errors.

The first bullet point in BOL:

"The fundamental rule of database design theory is that each table should have a unique row identifier, a column or set of columns used to distinguish any single record from every other record in the table. Each table should have an ID column (...)"

This starts okay, but then detoriates. The first sentence says that the identifier can be a single column or a set of columns. The very next sentence mentions only the ID column, as does the rest of this bullet point. It does not explicitly say "identity or newid()", but it's awful close.

As a result, many tables in SQL Server, including the one in the question, lack a "real" key. Neither IDENTITY, nor NEWID() are real keys. These are generators for surrogate keys, but a surroogate should always be backed by a real key. In the case of the table in the question, what will protect the system from duplicate entries because a clerk accidentally hits the "Add row" button twice? (Or even falls asleep with his finger on the button)

Third bullet:

"A table should avoid nullable columns."

No. A table should avoid unnecessary nullable columns. When values may be missing in a column, making the column nullable is actually the best way to handle this. This advice leads to columns that are made not nullable and than use some magic value to represent a missing value. The argument about complexity of handling nulls is nonsense; handling the magic values requires even more complex special handling. The default behaviour of nulls usually matches how people want missing information to be treated.

The idea to put nullable columns in a seperate table only makes sense if all these columns are linked to a single subtype, so that they all are not nullable in the new table. And even then, the use is limited - the downside of splitting these columns to their seperate table is the introduction of outer joins in many queries. And the "special handling" is not avoided, since these outer joins throw the nulls right back in your face.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/