Thanks Mike, a very interesting article.
My first thought on reading this was 'Yes well it's all very nice in theory, but not much use in the real world'. Then I felt ashamed, because it struck me that just because my current information environment doesn't require 6NF, doesn't mean I will never find myself in a situation where this high level of normalisation is appropriate, and I should file this information away somewhere.
I have found that depending on how your indexes are defined, it is always worth comparing performance of queries structured like this:
WHERE @DateToCheck >= StartDate AND (@DateToCheck <= EndDate OR EndDate IS NULL)
to the equivalent queries using COALESCE
WHERE @DateToCheck BETWEEN StartDate AND COALESCE(EndDate, @DateToCheck)
I actually eliminated EndDates from my employee contract records, because in my environment, a contract will always run until superceded by a contract with a later StartDate or until the employee termination date is reached.
Excellent article. I liked it so much I just ordered your not-yet published book. You have demonstrated a rare but important trait in technical writing...the ability to express technical concepts in clear and interesting fashion while remaing concise yet thorough in your presentation.
You have a secure grasp of the concepts and how they may be applied in real world production situations. Furthermore you have demonstrated your willingness to research other's articles and books.
I hope and expect that your book will exhibit the same applied skills.
Bravo... keep up the good work.
This is interesting theory, but what about situations such as work orders which are not complete but have a DateCompleted field. That item is simply not know, and using an arbitrary date such as 1/1/1900 is simply not acceptable. And pulling up incomplete work orders with an elegant WHERE DateCompleted IS NULL is easy and it makes sense.
I appreciate that there are pitfalls with using NULLs which require care, and that sometimes large numbers of NULLS can reflect a poor design, but I remain unconvinced that eliminating NULLS is the hallmark of good design and programming.
Regarding the commission example, to consider something specific, I would prefer to explicitly know that a commission does not apply. Therefore, I might add a "IsCommissionEligible" BIT column. This is, most importantly, very clear to anyone who sees it later. If this bit is set, the value in the commission column is then irrelevant.
If, however, the commission needed to be indexed, especially to quickly find all non-commissioned workers, one might be forced to use a bogus value, such as -1, to indicate no commission. Sometimes real-world performance outweighs theoretical concerns.
I agree, and that's why I recommend learning how to use NULL properly and using it in moderation. Saying "Get rid of all the NULLs!" is easy; actually doing it is another matter.
RM expert Chris Date recommends eliminating NULLs completely, and one of the main tools in his arsenal is the Default. Sounds good in theory - eliminate all NULLs by just plugging in an arbitrarily chosen value. But it has its drawbacks. Consider the following table:
Just about any number you could come up with for a default temperature in this table would be wrong, and produce incorrect results. If we don't know the temperature readings on July 30 - August 1, we just don't know them; and no Default value will ever fix that.
Imagine what the average temperature for this time period would be if you plugged in 0 as a Default. How often does the temperature drop from 97 degrees to 0 degrees, and then jump back up to 102 degrees in July and August? Probably not too often. Of course you could plug in another default, say 100, but it will be strange (to say the least) to watch your temperatures jump from -10 degrees up to 100, and back down to -6 in January. Or you could use another throwback to the "magic number" era (-9999999, etc.), but then you still need to put special handling for that value in your queries; and the magic number for this table might not be the same one used in another table. You have to define your own special "behaviors" for these "magic numbers."
Same thing goes for dates in the database. One of the main reasons for the existence of NULL is to eliminate "magic numbers" and arbitrary placeholders with a single "special mark" that has a well-defined behavior. Whether or not you like, or agree with, the "definition" and "behavior" that ANSI has come up with for NULL is another discussion...
I would recommend staying away from "magic numbers" and arbitrary special markers. You might have noticed the subtle "bug" I introduced in the third query of "Step Two". In that query there are two arbitrarily-defined special markers for employees with "no contact number type", but they appear in the same huge query:
Now the client application has to deal with both of these special markers in a single column, even though they mean exactly the same thing.
Using a "magic date", like "1900-01-01" can cause problems as well. While "1900-01-01" might be considered by some to be reasonable as a substitute for NULL for a date in the future (like prisoner release date), it might not work as well for a date in the past (prisoner birth date); especially if you need to store historical data. You'll suddenly need to use two or more "magic dates" as substitutes for NULL in different columns of your table(s). It can turn into a real mess, really fast.
These are just a couple of the problems you can encounter when trying to use arbitrarily-defined special markers, "magic dates", and "magic numbers" in your database.
Thanks Scott and all,
I tried to keep the example simple for the article, but I might take your suggestion a step further and assign the "Is Commission Eligible" indicator to a given position instead of to an employee. And then assign each employee to a position.
In most cases whether or not an employee gets a commission depends on their position (e.g., "Salesperson" might be eligible, while "Janitor" is probably not), and commission eligibility will not necessarily follow an employee from position to position (e.g., Joe transfers from the Sales team to Janitorial Services). You might even want to keep a separate salary/position history for employees, so you can track their job and pay history - but all of this would require additional tables, constraints, etc. And I really wanted to keep my example as simple as possible for purposes of the article.