Who Likes NULL?

  • Dennis Q Miller - Friday, July 6, 2018 12:27 PM

    Your response is welcome...only constructive dialog is intended.

    1.  I agree, so don't use NULL for them.  Alternatives are simple for text fields or codes that reference a text field in a "lookup" table.  Not so much for numbers, dates, boolean, and other data types.  The real shortcoming here is that date datatypes rarely include "unknown", "not applicable", etc. within their domains.  So, how do we store such information: NULL?

    2. I agree that "Magic Numbers" is a bad practice, but I don't think this is the case here.  The -1 is a code, declared by the designers, and resolved in a parent table where it is fully disclosed.  There is no getting around hard-coding for special cases, but I would not hard-code to the -1, I would hard code to the meaningful value to which it resolves.   x="Not Applicable" is way more self-documenting than x IS NULL (and is more likely to produce the desired result than the tempting x=NULL).  But, my real point here is that x=NULL will work with some databases, but not others--thus, the lack of portability.  In fact, if memory serves, the way that comparison works is a configuration option in some cases.

    3. "With some systems"? IMHO, any mathematical operation against a null value should raise an exception, just like divide by zero does. But that is NOT the case.  Again, the portability issue surfaces. You could take an entire statistics class about handling missing values; NULL does not solve the problem. Just to clarify, I would never use -1 to mean something besides -1 in a column that represents a number; only in a field that holds a code.

    4. We agree on this: NULL should never be used for (foreign) keys. Any database that allows NULL in a unique key should be stricken from the earth.

    5. Again, in no way do I subscribe to "magic numbers".  Please, do not use -1 to mean anything else in a column representing a number.  To make my point, I sometimes use the contrived example of a countdown timer on an explosive device.  What risk could there be to using zero to mean "disabled"?

    Of course, this begs question: what should I use instead?  What to use in a column representing a number when the number is "missing", "unknown", "not applicable", "optional", etc. Perhaps NULL or perhaps a database design that better serves the use case.

    "4. We agree on this: NULL should never be used for (foreign) keys. Any database that allows NULL in a unique key should be stricken from the earth."

    So what would you do?  TITLE table has an identity column TITLE_ID as the PK, and a unique index with CUSTOMER_ID and NAME.  All of these columns are non-Null.  POSITION table has a Nullable TITLE_ID, because much of the legacy data I import has no value here.  If i want an FK in TITLE_ID, I can't add a -1 here, doesn't exsit in the TITLE table.  The alternative would be to add an "Unknown" or "n/a" or "" for each CUSTOMER_ID.  Why add all these meaningless rows to the TITLE table?

  • I think this discussion boils down to: (may want to use earphone at work)

    https://www.youtube.com/watch?v=lF-cdujoNak

    ...

    -- FORTRAN manual for Xerox Computers --

  • Jay Milo - Friday, July 6, 2018 12:47 PM

    "4. We agree on this: NULL should never be used for (foreign) keys. Any database that allows NULL in a unique key should be stricken from the earth."

    So what would you do?  TITLE table has an identity column TITLE_ID as the PK, and a unique index with CUSTOMER_ID and NAME.  All of these columns are non-Null.  POSITION table has a Nullable TITLE_ID, because much of the legacy data I import has no value here.  If i want an FK in TITLE_ID, I can't add a -1 here, doesn't exsit in the TITLE table.  The alternative would be to add an "Unknown" or "n/a" or "" for each CUSTOMER_ID.  Why add all these meaningless rows to the TITLE table?

    I would argue Unknown and Not Applicable are necessary choices for the TITLE since your legacy data allows for titles to be missing. They should have been included all along as non-delete-able choices when the software was installed.

  • roger.plowman - Friday, July 6, 2018 1:02 PM

    I would argue Unknown and Not Applicable are necessary choices for the TITLE since your legacy data allows for titles to be missing. They should have been included all along as non-delete-able choices when the software was installed.

    Basically there is a difference between specifying that a title is 'unknown'  in that it suggests that someone has already come to that conclusion, vs NULL which means that info is completely missing and no attempt has been made so far to determine it.

    ...

    -- FORTRAN manual for Xerox Computers --

  • I'm enjoying this topic but there's a world of difference between an academic discussion about whether NULL is a good or bad idea (cf.Hoare, Codd) and what developers do with databases in the wild.  For example, an Employee record has  DateEnd column to model the date she leaves the company.  What value, other than NULL, makes sense there?  As a matter of interest I've recently migrated a legacy database where the designer of the database (who was also on the development team) used a notional date in the future (1 January 2050) in a related table to flag that a row was the current row.  The hilarious thing is that the dev who came up with this skunked bollix is widely revered.  Well not by me, he's not (of course this fatuous design decision wasn't documented anywhere; I had to figure it out from the logic).
    </rant>

  • jay-h - Friday, July 6, 2018 1:08 PM

    roger.plowman - Friday, July 6, 2018 1:02 PM

    I would argue Unknown and Not Applicable are necessary choices for the TITLE since your legacy data allows for titles to be missing. They should have been included all along as non-delete-able choices when the software was installed.

    Basically there is a difference between specifying that a title is 'unknown'  in that it suggests that someone has already come to that conclusion, vs NULL which means that info is completely missing and no attempt has been made so far to determine it.

    1. this data "is completely missing and no attempt has been made so far to determine it".  As i mentioned before, the only way to udpate the data from the website is to actually add a title (and grade)
    2. this data is not my or my companies data, it is our clients data, 100% maintained by them.  I sometimes help them get it into the database, especially for an initial implementation, but they provide all the data.

  • jay-h - Friday, July 6, 2018 12:59 PM

    I think this discussion boils down to: (may want to use earphone at work)

    https://www.youtube.com/watch?v=lF-cdujoNak

    now i may never get back to work!

  • Jay Milo - Friday, July 6, 2018 1:19 PM

    1. this data "is completely missing and no attempt has been made so far to determine it".  As i mentioned before, the only way to udpate the data from the website is to actually add a title (and grade)
    2. this data is not my or my companies data, it is our clients data, 100% maintained by them.  I sometimes help them get it into the database, especially for an initial implementation, but they provide all the data.

    Option:
    Add 1 record to
       - Customer table: CUSTOMER_ID=0, NAME='' (or 'Unknown" or whatever you like)
       - Title table: TITLE_ID = 0, CUSTOMER_ID=0, NAME='' (or 'Unknown' or whatever you like)Then add a default constraint on your foreign keys
       - CUSTOMER.CUSTOMER_ID= '0'
       - POSITION.TITLE_ID='0' (field changed to NOT NULL)

    Run an update statement to change all NULL values in your foreign keys to 0. This way you can provide something that has more meaning (though fuzzy) than nothing. But it will remove the need for NULL foreign keys.

    When I've created reports, this has simplified the end logic by removing it from that layer & putting it in the db. I could also adjust the NULL replacement value by changing one value in one record of one table.

    Smarter people on this post may have some better ideas 😉

  • RLilj33 - Friday, July 6, 2018 1:41 PM

    Option:
    Add 1 record to
       - Customer table: CUSTOMER_ID=0, NAME='' (or 'Unknown" or whatever you like)
       - Title table: TITLE_ID = 0, CUSTOMER_ID=0, NAME='' (or 'Unknown' or whatever you like)Then add a default constraint on your foreign keys
       - CUSTOMER.CUSTOMER_ID= '0'
       - POSITION.TITLE_ID='0' (field changed to NOT NULL)

    Run an update statement to change all NULL values in your foreign keys to 0. This way you can provide something that has more meaning (though fuzzy) than nothing. But it will remove the need for NULL foreign keys.

    When I've created reports, this has simplified the end logic by removing it from that layer & putting it in the db. I could also adjust the NULL replacement value by changing one value in one record of one table.

    Smarter people on this post may have some better ideas 😉

    Valid option, thanks.

  • ZZartin - Friday, July 6, 2018 12:39 PM

    Null isn't just a weird database concept, it comes up plenty of times in general development where something doesn't have a value.  As it relates to databases NULL is a value that has a very specific meaning and functionality(yes that may very platform by platform) and works predictably if sometimes counter intuitively in ways that no other value would.  And unlike magic values it's functionality is consistent across different fields, a NULL in a varchar works the same way regardless of what the field is or what it's used for.

    So, NULL is a value for something that doesn't have a value?  Kinda muddies the waters.  I have few issues with allowing NULL when a value is optional, precisely to mean that the value is absent.  I do caution about assigning other meanings to NULL.  NULL is more like the state of a field than the value of it.

  • Dennis Q Miller - Saturday, July 7, 2018 9:24 AM

    So, NULL is a value for something that doesn't have a value?  Kinda muddies the waters.  I have few issues with allowing NULL when a value is optional, precisely to mean that the value is absent.  I do caution about assigning other meanings to NULL.  NULL is more like the state of a field than the value of it.

    I agree.  To me, NULL is not NOTHING.  NOTHING is a value.  NULL is a state of a container that could hold a value and that state is UNKNOWN.

    As with all else in SQL Server, though, "It Depends".  What I mean by that is that we're working with computers here.  You can assign a value to the state of being NULL if you want but ONLY if you document the deviation in Extended Properties and in the code that is taking advantage of the deviation.  But, like you (an assumption on my part based what you've said above), I've not yet found a reason to deviate from the basic definition of NULL simply meaning UNKNOWN and I don't recommend assigning any other definition to it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • edwardwill - Friday, July 6, 2018 1:17 PM

    I'm enjoying this topic but there's a world of difference between an academic discussion about whether NULL is a good or bad idea (cf.Hoare, Codd) and what developers do with databases in the wild.  For example, an Employee record has  DateEnd column to model the date she leaves the company.  What value, other than NULL, makes sense there?  As a matter of interest I've recently migrated a legacy database where the designer of the database (who was also on the development team) used a notional date in the future (1 January 2050) in a related table to flag that a row was the current row.  The hilarious thing is that the dev who came up with this skunked bollix is widely revered.  Well not by me, he's not (of course this fatuous design decision wasn't documented anywhere; I had to figure it out from the logic).
    </rant>

    An amusing, if it weren't such a prevalent practice, of "magic numbers".

    As for your example:
    I'd rather see DateEnd as a nullable column of employee than using a magic number like '12/31/9999'.  But, I'd also consider a different design without a DateEnd column in the Employee table.  For example, what about an EmployeeTermination table to also support other attributes like TerminationReason and, for that matter, multiple terminations for the same employee over time.  Just sayin'

    "Databases in the wild" is saying it like it is.

  • Romac - Friday, July 6, 2018 3:41 AM

    edwardwill - Friday, July 6, 2018 2:19 AM

    When designing a table, it is essential to realise that there are real-world circumstances where NULL may be an important and meaningful representation of the data being modelled.

    Thoroughly agree!

    What opinions do people have on, for example, postal address lines? Typically you might have 4 address lines plus city/town, county/state, postal code, country. Most addresses don't use all four lines. If not using NULL, what would you populate the unused address lines with?

    Empty String is elegantly simple, a fairly intuitive representation of "unused", and not fraught by the problems associated with NULL.

    But, asking the same question about a number or date column is worthy of more discussion.

  • I like the use of NULL's, but I try to structure the tables so that when fields are number, there's not a lot of them in any given row of a table. Nor is the table half comprised of NULLs. And I thank God that the frameworks built around creating applications are now much more tolerate of NULLs. Back in the 90's when I was working largely with VB6 the calendar control (or was it the datetime control? I no longer remember) was too stupid to handle any NULLs. If a NULL was in a datetime field, then the control would crash any program using it. So we had to use magic values for dates, in order to get around the stupid control's shortcomings. And like you said Steve, every piece of software and report written against the databases had to take that into account. Well, that's all in the past now, thank God.

    Rod

  • edwardwill - Friday, July 6, 2018 1:17 PM

    I'm enjoying this topic but there's a world of difference between an academic discussion about whether NULL is a good or bad idea (cf.Hoare, Codd) and what developers do with databases in the wild.  For example, an Employee record has  DateEnd column to model the date she leaves the company.  What value, other than NULL, makes sense there?  As a matter of interest I've recently migrated a legacy database where the designer of the database (who was also on the development team) used a notional date in the future (1 January 2050) in a related table to flag that a row was the current row.  The hilarious thing is that the dev who came up with this skunked bollix is widely revered.  Well not by me, he's not (of course this fatuous design decision wasn't documented anywhere; I had to figure it out from the logic).
    </rant>

    20500101 is a value that can be indexed and be used a search argument. It is a much more efficient use of resources when doing a search for all people employed at a particular point int time in comparison to having to use COALESCE() or ISNULL():
    ...
    WHERE validfrom >= @Parameterdate
    AND validto < @ParameterDate
     as against
    ...
    WHERE validfrom >= @Parameterdate
    AND ( validto < @ParameterDate OR validto IS NULL )
     or
    ...
    WHERE validfrom >= @Parameterdate
    AND COALESCE( validto, '20500101' ) < @ParameterDate

    I do agree with you that it is not nice though.

Viewing 15 posts - 46 through 60 (of 143 total)

You must be logged in to reply to this topic. Login to reply