Sic Semper NULL

  • Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/mcoles/2921.asp

  • Mmmm, for me the issue usually occurs when reviewing/maintaining someone else's code and that code has dodgy null usage. When used correctly, nulls are a good thing. Overall, I found this article to be a really good treatment of nulls and when to use/not use them.

    However, there are some situations where I choose to use a value instead of null to simplify a query and to get queries to perform. An example of this is tables that use have start and end dates. It is common for an end date to be unknown right up to the point where the record actually has an end date. So, when you want to get a list of records that were "active" based on the start and end dates, you usually end up with code something like

    WHERE @DateToCheck BETWEEN StartDate And Isnull (EndDate, @DateToCheck)

    (There are many variations on the above - substitute your favourite)

    instead of

    WHERE @DateToCheck BETWEEN StartDate And EndDate

    The second condition tends to give a much better query plan (but only works if EndDate actually has a value. So, I tend to use nulls when the column really is unknown and there is no valid substitute so that I can make use of indexes.

    I am ready for responses that negate my thoughts.

    Cheers

    Stehen

  • Great article on NULL usage.

    I find NULL's in a database an invaluable tool, for many reasons... And easily avoided with not null constraints and a default value, where inappropriate.

    In addition to making things more complex, NULL's also make a number of operations simpler, for example, the avg() function by default nicely excludes null values, causing an average of only known values to be returned.

    Handling NULL's according to the application of the data in question is generally a simple matter with functions such as coallesce and isNull; and without the ability to store a NULL, that is valuable data lost. With a NULL stored you have the information to handle the data precisely how you want to.

  • Thanks Mike, a very interesting article.

    My first thought on reading this was 'Yes well it's all very nice in theory, but not much use in the real world'. Then I felt ashamed, because it struck me that just because my current information environment doesn't require 6NF, doesn't mean I will never find myself in a situation where this high level of normalisation is appropriate, and I should file this information away somewhere.

    I have found that depending on how your indexes are defined, it is always worth comparing performance of queries structured like this:

    WHERE @DateToCheck >= StartDate

    AND (@DateToCheck <= EndDate OR EndDate IS NULL)

    to the equivalent queries using COALESCE

    WHERE @DateToCheck BETWEEN StartDate AND COALESCE(EndDate, @DateToCheck)

    I actually eliminated EndDates from my employee contract records, because in my environment, a contract will always run until superceded by a contract with a later StartDate or until the employee termination date is reached.

    David

    If it ain't broke, don't fix it...

  • Interesting Article,

    At the end of the day we are providing workable and maintainable solutions to real world problems and have to strike a balance between idealism and practicality. If a solution warrants the use of NULL then use it appropriately, if factoring out into another table makes more sense do it that way.

    Mike, I think you nailed it in your conclusion:

    "learn how to use them properly, and minimize their use where practical."

  • Excellent article. I liked it so much I just ordered your not-yet published book. You have demonstrated a rare but important trait in technical writing...the ability to express technical concepts in clear and interesting fashion while remaing concise yet thorough in your presentation.

    You have a secure grasp of the concepts and how they may be applied in real world production situations. Furthermore you have demonstrated your willingness to research other's articles and books.

    I hope and expect that your book will exhibit the same applied skills.

    Bravo... keep up the good work.

  • This is interesting theory, but what about situations such as work orders which are not complete but have a DateCompleted field.  That item is simply not know, and using an arbitrary date such as 1/1/1900 is simply not acceptable.  And pulling up incomplete work orders with an elegant WHERE DateCompleted IS NULL is easy and it makes sense.

    I appreciate that there are pitfalls with using NULLs which require care, and that sometimes large numbers of NULLS can reflect a poor design, but I remain unconvinced that eliminating NULLS is the hallmark of good design and programming. 

  • Excellent article!

    Regarding the commission example, to consider something specific, I would prefer to explicitly know that a commission does not apply.  Therefore, I might add a "IsCommissionEligible" BIT column.  This is, most importantly, very clear to anyone who sees it later.  If this bit is set, the value in the commission column is then irrelevant.

    If, however, the commission needed to be indexed, especially to quickly find all non-commissioned workers, one might be forced to use a bogus value, such as -1, to indicate no commission.  Sometimes real-world performance outweighs theoretical concerns.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • I agree, and that's why I recommend learning how to use NULL properly and using it in moderation. Saying "Get rid of all the NULLs!" is easy; actually doing it is another matter.

    RM expert Chris Date recommends eliminating NULLs completely, and one of the main tools in his arsenal is the Default. Sounds good in theory - eliminate all NULLs by just plugging in an arbitrarily chosen value. But it has its drawbacks. Consider the following table:

    DateTemperature
    July 2895
    July 2997
    July 30NULL
    July 31NULL
    August 1NULL
    August 2102

    Just about any number you could come up with for a default temperature in this table would be wrong, and produce incorrect results. If we don't know the temperature readings on July 30 - August 1, we just don't know them; and no Default value will ever fix that.

    Imagine what the average temperature for this time period would be if you plugged in 0 as a Default. How often does the temperature drop from 97 degrees to 0 degrees, and then jump back up to 102 degrees in July and August? Probably not too often. Of course you could plug in another default, say 100, but it will be strange (to say the least) to watch your temperatures jump from -10 degrees up to 100, and back down to -6 in January. Or you could use another throwback to the "magic number" era (-9999999, etc.), but then you still need to put special handling for that value in your queries; and the magic number for this table might not be the same one used in another table. You have to define your own special "behaviors" for these "magic numbers."

    Same thing goes for dates in the database. One of the main reasons for the existence of NULL is to eliminate "magic numbers" and arbitrary placeholders with a single "special mark" that has a well-defined behavior. Whether or not you like, or agree with, the "definition" and "behavior" that ANSI has come up with for NULL is another discussion...

    I would recommend staying away from "magic numbers" and arbitrary special markers. You might have noticed the subtle "bug" I introduced in the third query of "Step Two". In that query there are two arbitrarily-defined special markers for employees with "no contact number type", but they appear in the same huge query:

    • 'NO CONTACT NUMBER DESCRIPTION'
    • 'NO CONTACT NUMBER TYPE'

    Now the client application has to deal with both of these special markers in a single column, even though they mean exactly the same thing.

    Using a "magic date", like "1900-01-01" can cause problems as well.  While "1900-01-01" might be considered by some to be reasonable as a substitute for NULL for a date in the future (like prisoner release date), it might not work as well for a date in the past (prisoner birth date); especially if you need to store historical data.  You'll suddenly need to use two or more "magic dates" as substitutes for NULL in different columns of your table(s).  It can turn into a real mess, really fast.

    These are just a couple of the problems you can encounter when trying to use arbitrarily-defined special markers, "magic dates", and "magic numbers" in your database.

  • Thanks Scott and all,

    I tried to keep the example simple for the article, but I might take your suggestion a step further and assign the "Is Commission Eligible" indicator to a given position instead of to an employee. And then assign each employee to a position.

    In most cases whether or not an employee gets a commission depends on their position (e.g., "Salesperson" might be eligible, while "Janitor" is probably not), and commission eligibility will not necessarily follow an employee from position to position (e.g., Joe transfers from the Sales team to Janitorial Services). You might even want to keep a separate salary/position history for employees, so you can track their job and pay history - but all of this would require additional tables, constraints, etc.  And I really wanted to keep my example as simple as possible for purposes of the article.

  • Could we just have a function that will add all non-null values? Something like VSUM where V stays for value?

    Regards,Yelena Varsha

  • So... what's wrong with null?  Is it any more correct to have null or a zero length string for a middle name?  I notice the author didn't provide a middle name.  So does he not have one or did he just not provide it.  I could provide a zero length string (blank) or null or you could have a table of "names" and link the person only to those "names" which they have and indicate the position (first, middle, last) but that would be silly.  Normalization is a tool as is the judicious use of null.  When carried to any extreme you become dogmatic.  So, what's correct for an unknown value null or blank?  Maybe it depends on context.

    I run a simple business where I only ship items when all the items for the order are available (no back orders).  So ShippedOn is a column in the OrderHeader table.  When I receive an order, it hasn't shipped yet so, what date should be used?  I could create a table of Shipped Orders but that involves a join.  Also, taking that route I'd have to use an outer join to find pending orders.  I could add IsShipped (bit) column to indicate if it's shipped or not but, that's tramp data that could be answered if I just leave the ShippedOn date column.  I could make up a default date (1/1/1900) but that's just as valid as allowing nulls.

    Allowing nulls is a tool, just like not allowing nulls.  It depends on context, usage, business rules.  You know -- reality.

     

    --Paul Hunter

  • I dig the article Mike.  Way to explore an often-discussed topic more thoroughly than most of those discussions.

    I guess it was pretty early on when I decided to embrace the NULL.  It's been easy enough to handle them in the application, and in a facade when I'm not responsible for the application. 

    In one case, we got to the end of a project when a last minute requirement popped in requiring certain entities to own other entities.  We added a nullable FK to the entity being owned, referencing the owner.  Since the application was interacting with the database via SP's, we were able get the database ready in an hour or two.  That allowed us to spend the little time we had on the handful of minor changes to the application.

    I suppose that anecdote may support the use of SP's more than that of NULL's, but I still dig NULL's

  • An important meaning of nulls in the context of nullable columns is “value at present unknown” - the "A mark" null. The "I mark" null is arguably not especially well supported in general, as SQL effectively co-mingles both "types" of nulls. As the author alludes to with regard to tables with nullable columns, it is not unusual to encounter circumstances that may call for taking care with queries to correctly understand and summarize data i.e.(relative to similar tables without nullable columns), and that is perhaps the most obvious cost of NULLs.

    A point of some critics of NULLs that appears to be lost here however, is that a DBMS implementation that did not implement NULLs i.e.(supporting a "new" SQL say) would need to handle issues typically handled with NULLs in another (and perhaps better) matter. Another point that may be worth raising is that if NULLs are to be supported for use as either A marks or as I marks, there is at least one product (FirstSql) that arguably supports NULLs "better" than most.

  • Thank you for addressing this issue Michael.

    Fundamentally, NULL values indicate a fundamental flaw in tuple (noun) definition.  SQL Server's treatment of NULL equality can be unexpected for those who work with other databases.

    It’s impossible to avoid NULLs in many cases, specifically because of SQL Server's product position.  It’s often the platform of choice for ETL.  Therefore, many SQL ODS databases can suffer from "NULL" sickness because of a source system's poor table design.

    Michael does a good job of explaining the "real life so deal with it" issues.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply