Null Defaults

  • Comments posted to this topic are about the item Null Defaults

  • While I'm quite an advocate of using NULL values instead of magic values, due to the slightly interesting way that NULL values are handled in comparisons and sorts there are times when magic values are more appropriate. These cases usually revolve around the situations where non-technical users use and require fairly direct access to the data, usually for reporting purposes and usually date based columns. While it's easy enough to create views into the data and through these effectively implement magic values for these columns this process adds another administrative burden to a system.

    For the majority of non-date/time based there really shouldn't be a need for magic values. Sometimes pseudo-magic numbers are useful though, for example where a status lookup ID has an implied order where for example 0=don't know (default, non null) and 999=hell yeah, it makes queries rather simpler if they can be ordered by the ID direct rather than a sort column in a joined lookup table. Again, this situation could be handled in a view to make life easier for end users but as above this adds an administrative burden (and also makes query optimisation a little more involved as well).

    So I'd say that it's best to avoid both magic values and NULL values where possible. Magic values should really be handled using lookup references and default values (this tends to be hard with date/time columns) and NULLs avoided unless you genuinely need to record that no value has been provided, just beware of the consequences.

  • I would always use null to indicate unknown values, but in this case I don't quite see when you would legitimately not know if null is a valid value or not when putting your data model in production. During early development I can get behind going null by default until you start working on the DAL, then you need to start defining the contract imo.

  • Glad to see some good, sensible practice being advocated in the first two comments. I feared seeing a load of magic value recipes.
    My comment is that this situation is exactly suited to NULL values. NULL means that something is missing, unavailable, not supplied and is a case that must be dealt with properly. We all hate writing code with loads of ISNULLs or IS (NOT) NULL etc. and even more tiresome is dealing with it inside the application, but missing does not equal '', '1900-10-01', zero 0 or -1 it's NULL!
    If a value added to a table simply can't be NULL it must be defaulted to a meaningful value or that situation handled in the DAL or business logic.
    The fear of NULL is the result of lazy or worse ignorant programmers. It's nothing to worry about people, just write proper code!
    I totally agree with the points above regarding sorting and end user access, no rules are inviolable, but magic values are poisonous to me. Where users have access to data, I try to abstract away the complexities, including NULL values and outer joins where possible.
    Let's show some love for NULL in all its painful reality 🙂

  • I would advocate going down the NULL route. It's a new column after all and when that column is referenced by the calling application, that application should know what it wants to do with it or should handle it properly by setting it aside if it doesn't need it.

  • Using a 'magic value' to represent an unknown value generally does not seem to be a good idea.  Most of us have been in the situation where the user has states categorically that a certain case will never exist only for it to materialise a couple of years down the track as a legitimate case.  If for some reason some particular end user interface cannot handle null values(purists will say null is a state rather than a value), then provide a view which CASTs the null to some value acceptable to that user.
    Early drafts of the ISO (ANSI for Americans) SQL standard incorporated the concept of 'multi-valued' null to allow the differentiation of missing values, known unknowns not relevant etc... but the concept was dropped as none of the implementors were interested in providing it.  That proposed facility originated for the earlier ANSI work on multiple types of null

  • I like NULL as the deault choice. All other values have a tendency to create confusion at some point in the future in my experience. Most people have a good understanding of what NULL values mean.

  • Terje Hermanseter - Friday, March 3, 2017 2:51 AM

    I like NULL as the deault choice. All other values have a tendency to create confusion at some point in the future in my experience. Most people have a good understanding of what NULL values mean.

    I think NULL is the logical choice too, but I would also say that "knowing what NULL means" is pretty context-dependent! Take an employment termination date in a personnel record. While the employee is still employed, the termination date should (I suppose) be null, but that doesn't mean it's unknown, it means that because it hasn't yet happened yet it doesn't exist at all! Perhaps this is too much of a straw-splitting distinction.

    Sorting NULL fields can be a pain though. Often you want them to sort before any other date, as SQL Server does by default, but other times (as in the case of the termination date above) you want it to sort after. Hence the unwelcome appearance of magic values (one system I use uses a date in the year 2500 for this sort of thing).

    What might be cool would be to take a leaf from PostgreSQL's book and introduce two new values: positive infinity (i.e not NULL but it sorts after everything else) and negative infinity (not NULL but sorts before everything else). I've never actually used PostgreSQL so can't speak from experience, but it seems like a good idea. Some purists would no doubt disagree and I respect that. I have no interest in starting a religious war.

    I think that the root of the problem is that when you have several concepts (unknown, doesn't exst, not defined, ...) which are easily confused, context dependent and generally slippery to work with, trying to use just one thing (i.e. NULL) for all of them is necessarily going to involve a few compromises. And some extra work.

    MarkD

  • I have always used NULL values, although I have seen some (heated) arguments over the years against it!
    Unfortunately nobody has been able to convince me yet 😉
    There are more than enough functions available in SQL to handle nulls - NULLIF, ISNULL, COALESCE, IS NULL, IS NOT NULL etc.

    I agree with some of the other comments that null values should indicate 'missing' and/or incomplete data.
    For example: Let's assume you want to log the start and end time of an event. The start time might be available when the record is inserted, but not the end time.
    So a quick query for null values in the end time column will show you events that are still waiting to be completed.

    PS. Reporting Services also support null values which can be used to your advantage.

  • Mark Dalley - Friday, March 3, 2017 3:35 AM

    ...Take an employment termination date in a personnel record. While the employee is still employed, the termination date should (I suppose) be null, but that doesn't mean it's unknown, it means that because it hasn't yet happened yet it doesn't exist at all!...

    Which can be viewed as at the current point in time it is unknown. N'est pas?

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • NULL is my go to default for representing "Unknown" or "Missing". I cannot think of a time when doing OLTP where I ever needed to use a magic value to fulfill requirements on unknown.
    However, I have had to use a magic value before on one DW solution to represent a default date for joining (19000101) to a date dimension. There was some debate over report writer usability and cube design etc. Needs prioritization / deadlines can sometimes dictate an approach. 

    You cannot really avoid the work of understanding and handling unknowns in either case it is just if you are handling it in storage, retrieval, or display.

    Having to remember a magic value needs to display on a report as NULL  (NULLIF(19000101)) is harder to remember than recognizing a column is NULLable and you need to handle it in joins.

  • This question is about datatypes, one of my favorite subjects. I think that picking arbitrary values in a datatype to have a particular meaning is always a bad idea.

    My rules are to 1) Always pick the right datatype for the data, and 2) The zero value of the datatype is the default value. For example, SSN should be char(9) and the default value should be a null string.

    So null = default value, but I dislike nulls for the same reason: null is a magic value that has no meaning in terms of the datatype. Null is not in the set of integers, zero is.

    If you have integer values that actually have a meaning, like 1=A, 2=B, that is not an integer but a Type, and these should be in the Type table. Even there, the zero value should be the default value, such as 0 = Don't know/Didn't answer.

    A corollary to my rule is that table keys, such as account number, should never have arbitrary data built in, such as all accounts in Arizona begin with an A. I have never seen a system like this that did not eventually come to ruin, say when you acquire another system and try to merge databases.

  • Gary Varga - Friday, March 3, 2017 5:03 AM

    Mark Dalley - Friday, March 3, 2017 3:35 AM

    ...Take an employment termination date in a personnel record. While the employee is still employed, the termination date should (I suppose) be null, but that doesn't mean it's unknown, it means that because it hasn't yet happened yet it doesn't exist at all!...

    Which can be viewed as at the current point in time it is unknown. N'est pas?

    I totally agree that it can be viewed that way. Perhaps it was a poor example. My thought was more that the statement "Most people have a good understanding of what NULL values mean" is one I wouldn't venture to make without knowing more about the context.

    MarkD

  • Gary Varga - Friday, March 3, 2017 5:03 AM

    Mark Dalley - Friday, March 3, 2017 3:35 AM

    ...Take an employment termination date in a personnel record. While the employee is still employed, the termination date should (I suppose) be null, but that doesn't mean it's unknown, it means that because it hasn't yet happened yet it doesn't exist at all!...

    Which can be viewed as at the current point in time it is unknown. N'est pas?

    You know that it is a date.

  • null is a magic value

    I like the thinking that null is an indicator that a particular referenced area for storing values has not had a value stored. This even applies if that area represents a column sourced by a left joined table in a query. I like to think it doesn't mean "unknown", it doesn't  mean "missing." It means a value has not been stored or alternatively, a value is not available to be returned. Any meaning attached to nulls further than that is up to the design using that referenced information storage area, but because of that, SQL Server is completely agnostic on any further meaning other than the lack of a value being stored in the referenced storage area.

    I have no problem with people thinking otherwise, but its gotta be a pretty persuasive argument to change MY thinking on this.

Viewing 15 posts - 1 through 15 (of 41 total)

You must be logged in to reply to this topic. Login to reply