Who Likes NULL?

  • Lynn Pettis - Tuesday, July 10, 2018 9:31 AM

    So, what is the difference between:
    StartDate >= '2018-01-01' AND EndDate IS NULL
    and
    StartDate >= '2018-01-01 AND EndDate < '9999-01-01'

    Codd discussed two different types of unknown values: Unknown but applicable and Unknown and inapplicable.  I use "magic values" for the first type of unknown values and NULLS for the second type of unknown values.

    This is particularly relevant when working with overlapping intervals where the begin and/or end dates of one or both intervals may or may not be known.  When using "magic values" it's simple to tell if two ranges overlap.  (Of course this only works if you use appropriate "magic values", using a "magic value" in the past is not appropriate for an unknown future date.)


    a.StartDate < b.EndDate
    AND b.StartDate < a.EndDate

    This is much simpler than the corresponding code using NULLs.


    ( a.StartDate < b.EndDate OR a.StartDate IS NULL or b.EndDate IS NULL)
    AND (b.StartDate < a.EndDate OR b.StartDate IS NULL or a.EndDate IS NULL)

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Tuesday, July 10, 2018 10:27 AM

    Codd discussed two different types of unknown values: Unknown but applicable and Unknown and inapplicable.  I use "magic values" for the first type of unknown values and NULLS for the second type of unknown values.

    This is particularly relevant when working with overlapping intervals where the begin and/or end dates of one or both intervals may or may not be known.  When using "magic values" it's simple to tell if two ranges overlap.  (Of course this only works if you use appropriate "magic values", using a "magic value" in the past is not appropriate for an unknown future date.)


    a.StartDate < b.EndDate
    AND b.StartDate < a.EndDate

    This is much simpler than the corresponding code using NULLs.


    ( a.StartDate < b.EndDate OR a.StartDate IS NULL or b.EndDate IS NULL)
    AND (b.StartDate < a.EndDate OR b.StartDate IS NULL or a.EndDate IS NULL)

    Drew

    Different situations but I see your point.

  • drew.allen - Tuesday, July 10, 2018 10:27 AM

    Codd discussed two different types of unknown values: Unknown but applicable and Unknown and inapplicable.  I use "magic values" for the first type of unknown values and NULLS for the second type of unknown values.

    This is particularly relevant when working with overlapping intervals where the begin and/or end dates of one or both intervals may or may not be known.  When using "magic values" it's simple to tell if two ranges overlap.  (Of course this only works if you use appropriate "magic values", using a "magic value" in the past is not appropriate for an unknown future date.)


    a.StartDate < b.EndDate
    AND b.StartDate < a.EndDate

    This is much simpler than the corresponding code using NULLs.


    ( a.StartDate < b.EndDate OR a.StartDate IS NULL or b.EndDate IS NULL)
    AND (b.StartDate < a.EndDate OR b.StartDate IS NULL or a.EndDate IS NULL)

    Drew

    Exactly.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Lynn Pettis - Tuesday, July 10, 2018 12:38 PM

    Different situations but I see your point.

    Magic numbers can be great as long as the choice of magic values is done correctly and nothing changes to bring your magic value into being a "regular" value, or a change of platform invalidates said magical value.  When and if you ever find yourself in one of those cases, then NULL and the extra 20 characters being typed WRT NULL don't seem so  evil anymore.

    Verbose of not - having the NULL there is still a LOT more readable than having to have a whole table to track all of the variable, made up magical values you end up with.

    And yes - I've had to unwind a number of those stupid choices when the magic value wasn't far enough out of circulation.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (4) - Tuesday, July 10, 2018 1:15 PM

    Lynn Pettis - Tuesday, July 10, 2018 12:38 PM

    Different situations but I see your point.

    Magic numbers can be great as long as the choice of magic values is done correctly and nothing changes to bring your magic value into being a "regular" value, or a change of platform invalidates said magical value.  When and if you ever find yourself in one of those cases, then NULL and the extra 20 characters being typed WRT NULL don't seem so  evil anymore.

    Verbose of not - having the NULL there is still a LOT more readable than having to have a whole table to track all of the variable, made up magical values you end up with.

    And yes - I've had to unwind a number of those stupid choices when the magic value wasn't far enough out of circulation.

    Having been involved in Y2K compliance, I make sure that my "magic values" will be valid for long after I'm gone.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Tuesday, July 10, 2018 2:50 PM

    Having been involved in Y2K compliance, I make sure that my "magic values" will be valid for long after I'm gone.

    Drew

    Ditto that.  My biggest and only problem for the Y2K thing was filling out all the damned paperwork that the State Attorney Generals/PUC sent us.  They all required it to be done on their form and they were all different.  Most of them also asked what the most time consuming task in association with Y2K had been for me... My answer was filling out stupid forms created by a bunch of people that didn't know what they were talking about especially since all my stuff was ready for Y2K back in 1980.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, July 10, 2018 3:31 PM

    drew.allen - Tuesday, July 10, 2018 2:50 PM

    Having been involved in Y2K compliance, I make sure that my "magic values" will be valid for long after I'm gone.

    Drew

    Ditto that.  My biggest and only problem for the Y2K thing was filling out all the damned paperwork that the State Attorney Generals/PUC sent us.  They all required it to be done on their form and they were all different.  Most of them also asked what the most time consuming task in association with Y2K had been for me... My answer was filling out stupid forms created by a bunch of people that didn't know what they were talking about especially since all my stuff was ready for Y2K back in 1980.

    I thought I was safe too, and then we decided to buy into long-tail insurance lines, and there went THAT magic number  😀  We have a policy that started in the 40's...

    Of course - YMMV!

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • drew.allen - Tuesday, July 10, 2018 10:27 AM

    Codd discussed two different types of unknown values: Unknown but applicable and Unknown and inapplicable.  I use "magic values" for the first type of unknown values and NULLS for the second type of unknown values.

    This is particularly relevant when working with overlapping intervals where the begin and/or end dates of one or both intervals may or may not be known.  When using "magic values" it's simple to tell if two ranges overlap.  (Of course this only works if you use appropriate "magic values", using a "magic value" in the past is not appropriate for an unknown future date.)


    a.StartDate < b.EndDate
    AND b.StartDate < a.EndDate

    This is much simpler than the corresponding code using NULLs.


    ( a.StartDate < b.EndDate OR a.StartDate IS NULL or b.EndDate IS NULL)
    AND (b.StartDate < a.EndDate OR b.StartDate IS NULL or a.EndDate IS NULL)

    Drew

    I believe Codd came up with a few more variations, as time went on.  And Chis Date argued repeatedly that NULL was not a good solution for any of them.  The shortcoming lies in relational theory: a data structure that allows NULL cannot represent a relation, therefore, treating it like one is ill-fated. Since NULL is not a value and does not behave like a value, you technically cannot even achieve 1NF if you allow NULL in a column.

    Consider that, by definition, all rows held in a table must have the same number of values to represent relation. A NULLable column presents the situation where some rows may have more values than others.  Oops...

    So, in the real world, we come to the understanding that SQL server is not a relational database and its SQL dialect is not a relational language.  Whew!  With that behind us, we can debate best practices with respect to NULL in SQL Server.  That is not to say that other practices won't work; it just means each has its advantages/disadvantages and the preference depends HUGELY on the use case (which has for the most part been absent from this discussion).

    What's most interesting to me, is that we engage in a lively debate about pretending NULL is a special value or using a magic number in lieu of NULL.  What escapes the conversation though is a schema design that doesn't introduce NULL in the first place.

  • Dennis Q Miller - Tuesday, July 10, 2018 10:10 PM

    What's most interesting to me, is that we engage in a lively debate about pretending NULL is a special value or using a magic number in lieu of NULL.  What escapes the conversation though is a schema design that doesn't introduce NULL in the first place.

     I mentioned back this back on the first page, that NULLs are a necessary evil and that they are a reflection of the real world. I try to avoid them whenever I can, but one cannot avoid them. They can be present whenever one uses outer joins, for example.

     To eliminate NULLs as much as possible, one has to think differently [1]: increasing the normal form, for example, or making a table longer rather than wider. This may make queries more complex on the grounds that more joins are involved. Mostly, I think, it is down to the way that one has always built tables. It is all-too-easy to translate the requirements of the customer directly into tables, without thinking how it can be done more efficiently. If you start with the principle that nullable fields are the exception rather than the rule, then it becomes easier to avoid them.

    As an example of this, I am trying to prevent our future tables have a sparse field named «DeletionDate», which indicates that the entry has been logically deleted. I would rather have a complete new deleted schema into which rows to be deleted would be copied, along with all relevant referenced entries (to their relevant tables). It is more work than simply have a DeletionDate-field but it does make queries and indexes easier (and the table a bit shorter). Data that has been deleted should never have been in the DB in the first place. It is kept because it was entered in, but it is kept elsewhere. Rather than have this sparse nullable field, I'd have the deleted-schema, with SPs to cater for FKs and so on. A colleague of mine is arguing something similar but using Change Data Capture or Temporal Tables instead.

     We need to accept that NULLs are a part of the job and understand all of their implications (LEFT JOINs and WHERE clauses being an example, ISNULL()/COALESCE() and the use of indexes as well as how NULL reacts with operators being but a few examples)

    [1] adherents to the Cult of Jobs are permitted to think different. I'm more a fan of Woz myself, although the fruit of the philosophical labour produced many a nice Macintosh computer.

  • roger.plowman - Tuesday, July 10, 2018 9:15 AM

    Because Null only gives you one value, a value that doesn't tell you why it's null. In our company we have 3 values that replace null. The one closest in meaning to null is probably TBD (to be determined). It means the value is unknown at present, but is applicable and hasn't been verified as impossible to find out.

    N/A (not applicable) means not only is the data missing, it's missing deliberately and will never be set because it doesn't apply to this record. For example, we have a Site table which contains an ATM field. However Sites apply to both ATMs (for replenishment/servicing) as well as places we perform courier work for (such as credit unions) that do not have an ATM. Thus the ATM field for the site is N/A. Null would be ambiguous in this context, is it null because it's never been filled in (somebody forgot) or because there is no ATM?

    Finally, UNK (verified unknown) indicates that the field is empty because the data has been lost and is unrecoverable. One good example of this would be the purchase date for a piece of equipment so old we either lost or got rid of the purchase order.

    Null in these cases doesn't work. It doesn't tell you why the field is empty, only that it is. For us, at least, that's not sufficient.

    So we use magic numbers in domains that have naturally unused spaces. For example our date and date/time fields use dates of 01/01/1900 and before since nothing we deal with existed before 1900--including employees! :hehe:

    For us the use of null and its magic number replacements are so critical our development wiki (everyone has a developer's wiki, right? :)) lists it as one of the first links on the development guidelines overview, along with the proper use of default values.

    Sorry, NULL doesn't give you "one value"; it is the absence of a value.  If it expressed a value it would certainly be equal to itself.

    And domains don't have "unused spaces".  Datatypes are insufficient implementations of  domains because:
    1). they commonly allow values that are not in the domain (unused spaces)
    2). they don't have ways to represent values like "unknown" and "not applicable" that are in the domain
    and that's just to

  • Dennis Q Miller - Wednesday, July 11, 2018 1:01 AM

    Sorry, NULL doesn't give you "one value"; it is the absence of a value.  If it expressed a value it would certainly be equal to itself.

    And domains don't have "unused spaces".  Datatypes are insufficient implementations of  domains because:
    1). they commonly allow values that are not in the domain (unused spaces)
    2). they don't have ways to represent values like "unknown" and "not applicable" that are in the domain
    and that's just to

    Null being a value or not is semantics. Null is a countable state, therefore it is a value, even though it represents no value. In a nullable Boolean for instance, there's true, false, and null. Thus three possible states. Calling them values may not be pedantically correct, but for all intents and purposes null is a "third value", just a very recalcitrant one.

    Domains do indeed have unused spaces. Consider the date domain as represented by the DateTime data type. The domain is 1/1/1753 to 12/31/9999. Now apply that data type to the domain of "hired persons" (employees that have ever worked for the company, regardless of current employment.

    Granted, this varies by company but let's assume the following facts. 1) The company's founding was 3/15/1992. 2) The company did not hire anyone 92 years of age at its founding, and in fact never hired anyone born on or before 1/1/1900--and now literally can not since such a person would be at least 118 years old. 🙂

    Thus for this domain the "unused" portion of the domain is certainly any date before 1/1/1900. We may not know the exact beginning of the domain "hired person" but we can guarantee any date before 1/1/1900 is not part of that domain--and this will be true forever, not just for a few thousand years. 🙂

    Now, it is true that certain companies, for certain domains, will have dates that are < 1/1/1900 so this magic value may not be universally part of the unused datetime domain. However, unless the company is dealing with genealogy or history the date 1/1/1753 almost certainly falls in the unused space.

    Magic values are frowned upon in IT because of their arbitrary nature, but when well documented (read, beaten into employees with a clue stick upon hire) then they become an extremely useful tool--one that, unlike null, follows the rules of the data type domain.

  • roger.plowman - Wednesday, July 11, 2018 6:54 AM

    Dennis Q Miller - Wednesday, July 11, 2018 1:01 AM

    Sorry, NULL doesn't give you "one value"; it is the absence of a value.  If it expressed a value it would certainly be equal to itself.

    And domains don't have "unused spaces".  Datatypes are insufficient implementations of  domains because:
    1). they commonly allow values that are not in the domain (unused spaces)
    2). they don't have ways to represent values like "unknown" and "not applicable" that are in the domain
    and that's just to

    Null being a value or not is semantics. Null is a countable state, therefore it is a value, even though it represents no value. In a nullable Boolean for instance, there's true, false, and null. Thus three possible states. Calling them values may not be pedantically correct, but for all intents and purposes null is a "third value", just a very recalcitrant one.

    Domains do indeed have unused spaces. Consider the date domain as represented by the DateTime data type. The domain is 1/1/1753 to 12/31/9999. Now apply that data type to the domain of "hired persons" (employees that have ever worked for the company, regardless of current employment.

    Granted, this varies by company but let's assume the following facts. 1) The company's founding was 3/15/1992. 2) The company did not hire anyone 92 years of age at its founding, and in fact never hired anyone born on or before 1/1/1900--and now literally can not since such a person would be at least 118 years old. 🙂

    Thus for this domain the "unused" portion of the domain is certainly any date before 1/1/1900. We may not know the exact beginning of the domain "hired person" but we can guarantee any date before 1/1/1900 is not part of that domain--and this will be true forever, not just for a few thousand years. 🙂

    Now, it is true that certain companies, for certain domains, will have dates that are < 1/1/1900 so this magic value may not be universally part of the unused datetime domain. However, unless the company is dealing with genealogy or history the date 1/1/1753 almost certainly falls in the unused space.

    Magic values are frowned upon in IT because of their arbitrary nature, but when well documented (read, beaten into employees with a clue stick upon hire) then they become an extremely useful tool--one that, unlike null, follows the rules of the data type domain.

    It depends but the big difference between NULL and magic values is that NULLs do not behave the same way as other values in the database engine whereas magic values do.  NULL also has consistent behavior across different applications at the database level, magic values do not since they'll entirely depend on what value you pick.

  • Lynn Pettis - Tuesday, July 10, 2018 9:31 AM

    So, what is the difference between:
    StartDate >= '2018-01-01' AND EndDate IS NULL
    and
    StartDate >= '2018-01-01 AND EndDate < '9999-01-01'

    You are illustrating how NULL is prone to coding error.  The first statement can never be true, but the second can!

  • ZZartin - Wednesday, July 11, 2018 7:59 AM

    It depends but the big difference between NULL and magic values is that NULLs do not behave the same way as other values in the database engine whereas magic values do.  NULL also has consistent behavior across different applications at the database level, magic values do not since they'll entirely depend on what value you pick.

    Even such consistency "depends".  If you look at the connections for SSMS, ANSI NULLs is turned on.  For external connections, it is not.  MS was (and may still be) making noises that it may no longer be an option and that ANSI NULLs will be enabled for ALL connections.  That set the front-end world on fire because many are used to being able to relate NULL = NULL.  It would break a lot of code if MS ever forced ANSI NULLs on the front-end world.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Dennis Q Miller - Wednesday, July 11, 2018 8:23 AM

    You are illustrating how NULL is prone to coding error.  The first statement can never be true, but the second can!

    Not sure how you think the first statement can never be true if NULL is used to signify that no EndDate has been registered yet.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 91 through 105 (of 143 total)

You must be logged in to reply to this topic. Login to reply