Who Likes NULL?

  • Jay Milo - Friday, July 6, 2018 8:21 AM

    I love Nulls.  I use them extensively in both database design and stored procs/functions.

    One of the biggest reasons I use Nulls is for optional columns that need foreign keys.  When I inherited the database it was littered with "-1" values or some other meaningless number to indicate the user didn't pick a value.  And foreign keys seem to have been a "foreign" concept (pun intended) to my predecessor.  Worse than anything, they still had the column as Nullable!

    For example, we have teachers in our database, and they must be defined with a Location plus an optional Title and Grade (this is the POSITION table).  I would rather use Nulls for Title and Grade instead of added a "fake" value of blank ("") or "n/a" in both of those tables (TITLE and GRADE).

    Something new I've started recently is having a unique two column index where only one value is populated.  They both have foreign keys and therefore both are Nullable.  A specific example is when some piece of data can either be Customer-specific or State-specific but never both.  I add a computed column [STATE_OR_CID] AS (isnull([STATE_ID],[CUSTOMER_ID])) that is then added to the unique index.  As long as the two IDs have no value in common this works beautifully 🙂

    Well, our standard is to use an INT identifier as the primary key for all tables, starting with 1 and incremented by 1. We call it a RID. Since RIDS by definition must be positive the RIDs 0, -1, and -2 are free to be used for the records "To Be Determined", "Not Applicable" and "Verified Unknown". Thus a foreign key never needs to be (and is forbidden to be) nullable.

    From one point of view the three non-positive RIDs are actually part of the domain of the table. 🙂 Because there are cases where all three flag values are absolutely appropriate, just as a null would be. A simple > 1 WHERE clause excludes them when you want.

    As for TITLE, that's either going to be a fixed domain of titles (which should be an FK to a title table) or it's a string, in which case "N/A" is perfectly acceptable and even expected.

    GRADE likewise is a limited domain with a fixed range meaning even an INT value for grade (0 to 100 say) gives you the ability to use N/A (-1) and verified unknown (-2), although to be determined may or may not fit depending on context. That fits with the values used for FKs.

    Likewise, any additional "null replacements" fit easily into such a scheme. Most domains have such an empty space, although some (such as Boolean) don't.

    I suppose your view of nulls really depends on how tolerant you are of A) having missing data and B) caring why the data is missing. My company really can't tolerate either very well, thus our view nulls are EVIL. 🙂

  • RLilj33 - Friday, July 6, 2018 8:19 AM

    If you do not like NULL, please provide a value that represents all of the space in the universe that is empty.

    Whether you like them or not, they simply are (or, rather, are not). And that is the true art of db work: attempting to reflect the world as it is - like it or not.

    For example, if you have an optional field on a web form that does not require a user to enter a value, why assign one (i.e. Address2)?

    In the case of an address, we are relatively safe to assume that they have an address, and we know that it will be a string value.  In that situation the value of Address2 is empty, not unknown.  I would therefore assign a default value of empty string ('').

    On the other hand, if the column was 'DateDocumentIssued' or something similar then a null would let me know that no value exists which in turn tells me that no document has been issued.  The alternative there is either to assume that the SQL Server minimum date indicates an unset value, which *could*, however, unlikely that is, potentially be a valid value in the context of the data or to have one column for 'DocumentIssued' and another for 'DocumentIssueDate'.  In this situation it seems to me that a null correctly represents the state of the data in the most efficient and cost effective way.

  • roger.plowman - Friday, July 6, 2018 8:43 AM

    RLilj33 - Friday, July 6, 2018 8:19 AM

    If you do not like NULL, please provide a value that represents all of the space in the universe that is empty.

    Whether you like them or not, they simply are (or, rather, are not). And that is the true art of db work: attempting to reflect the world as it is - like it or not.

    For example, if you have an optional field on a web form that does not require a user to enter a value, why assign one (i.e. Address2)?

    Not to be pedantic, but the value that represents empty space in the universe is called "zero point energy". 🙂 It is, in fact, about as far from nothing (null) as it's possible to get... 🙂

    Thank you roger.plowman. I will consider that as my lesson learned for the day!

    I believe my point is still valid - there are things in life (and db work) that we ascribe values to that can add confusion instead of clarification. One example is the use of a made-up value to provide a representation of NULL. In the example of using -1, consider that QlikView/QlikSense views -1 as a TRUE value. If you were to create a Qlik application that reads these values, you may end up with many false-positive values.

  • roger.plowman - Friday, July 6, 2018 9:01 AM

    Jay Milo - Friday, July 6, 2018 8:21 AM

    I love Nulls.  I use them extensively in both database design and stored procs/functions.

    One of the biggest reasons I use Nulls is for optional columns that need foreign keys.  When I inherited the database it was littered with "-1" values or some other meaningless number to indicate the user didn't pick a value.  And foreign keys seem to have been a "foreign" concept (pun intended) to my predecessor.  Worse than anything, they still had the column as Nullable!

    For example, we have teachers in our database, and they must be defined with a Location plus an optional Title and Grade (this is the POSITION table).  I would rather use Nulls for Title and Grade instead of added a "fake" value of blank ("") or "n/a" in both of those tables (TITLE and GRADE).

    Something new I've started recently is having a unique two column index where only one value is populated.  They both have foreign keys and therefore both are Nullable.  A specific example is when some piece of data can either be Customer-specific or State-specific but never both.  I add a computed column [STATE_OR_CID] AS (isnull([STATE_ID],[CUSTOMER_ID])) that is then added to the unique index.  As long as the two IDs have no value in common this works beautifully 🙂

    Well, our standard is to use an INT identifier as the primary key for all tables, starting with 1 and incremented by 1. We call it a RID. Since RIDS by definition must be positive the RIDs 0, -1, and -2 are free to be used for the records "To Be Determined", "Not Applicable" and "Verified Unknown". Thus a foreign key never needs to be (and is forbidden to be) nullable.

    From one point of view the three non-positive RIDs are actually part of the domain of the table. 🙂 Because there are cases where all three flag values are absolutely appropriate, just as a null would be. A simple > 1 WHERE clause excludes them when you want.

    As for TITLE, that's either going to be a fixed domain of titles (which should be an FK to a title table) or it's a string, in which case "N/A" is perfectly acceptable and even expected.

    GRADE likewise is a limited domain with a fixed range meaning even an INT value for grade (0 to 100 say) gives you the ability to use N/A (-1) and verified unknown (-2), although to be determined may or may not fit depending on context. That fits with the values used for FKs.

    Likewise, any additional "null replacements" fit easily into such a scheme. Most domains have such an empty space, although some (such as Boolean) don't.

    I suppose your view of nulls really depends on how tolerant you are of A) having missing data and B) caring why the data is missing. My company really can't tolerate either very well, thus our view nulls are EVIL. 🙂

    School districts are our clients, and they have missing and unknown data all over the place.  Regarding TITLE and GRADE, they are tables that link back to the POSITION table via an INT identifier.  I also use an INT identifier as the primary key for all tables.  Out frontend system does not allow a Null TITLE_ID or GRADE_ID in the POSITION table.  If an update is made the user must choose a value for each of these.  These tables are maintained by each client, and if they choose to have an "unkonwn" or "n/a" title or grade, they can create one.  But that is not the norm. When I import data, that's where the missing data comes in as Null.  I can't add a 0, -1 or anything other fake ID if I want (and i do want) the FK that links POSITION back to TITLE and GRADE.

  • The theoretical problem with nulls is well-documented, effectively degrading any table that allows them to something less than a relation.  That said, even Codd, the pioneer of relational databases, recognized a need for them. Despite the protests of many purists, nulls not only survive in virtually all SQL databases, they confuse, contaminate, and corrupt for the arguable benefits of simplicity.

    My position is to avoid them unless they have a strong use case and then manage them consistently and carefully.

    In a practice, nulls manifest several problems.

    1. There are different kinds of “missing values”, but only one kind of nulls.  For example, how do you distinguish a termination_date that is missing because nobody remembers, from one where it does not apply, from one where it hasn’t occurred yet?
    2. Nulls are not handled the same across different SQL dialects and sometimes even across different functions within the same dialect. Thus working SQL is less portable and the potential for coding errors is increased.
    3. Nulls are statistically and mathematically ambiguous.  Should a null value count in an average? If so, what value should it assume?  If not, how does one reconcile the number of rows in the calculation with the number of rows in the population.
    4. Nullable foreign keys can cause performance problems, demand the understanding and use of outer joins, and introduce hard-to-diagnose query complications.
    5. Logical operators applied to nulls are illogical.  Case in point null=null, null <> null, null < null, null > null are all false in most implementations.

    At the end of the day, both using and avoiding nulls require work-arounds and coding discipline.  Make an informed choice and do not be deceived into thinking nulls are an easy solution to a logically challenging situation.  We need a toolbox for ways to handle missing information.

  • Romac - Friday, July 6, 2018 3:41 AM

    What opinions do people have on, for example, postal address lines? Typically you might have 4 address lines plus city/town, county/state, postal code, country. Most addresses don't use all four lines. If not using NULL, what would you populate the unused address lines with?

    Don't get me started... worked on an ETL project where we were told the extra address lines could not be NULL, and they offered us various alternatives of what to put there, including the actual word NULL, an empty string, or just a period.  Of course Address Line 1 was actually being used as part of the person name for some reason so Address Line 2 was actually the real Address Line 1, but that's a different story.  It's the kind of stuff that made us all facepalm.

  • Back to the original topic, my preference for an OLTP system is to use NULL when there can be an unknown value.  For an OLAP type system such as a data warehouse, I do create special records in the dimension tables with a description of "Unknown" so that when a user is looking at the data in some reporting tool, it's clear what they are looking at instead of it just being blank.

  • NULL values are fine, they serve the very important role of identifying that we do not currently know the value for a specific field.  That doesn't mean that every field should be NULL able but a lot of them usually can be, and in that regard NULL values work a lot better than magic values because there's no possibility of confusing them with real data.

  • Dennis Q Miller - Friday, July 6, 2018 9:55 AM

    The theoretical problem with nulls is well-documented, effectively degrading any table that allows them to something less than a relation.  That said, even Codd, the pioneer of relational databases, recognized a need for them. Despite the protests of many purists, nulls not only survive in virtually all SQL databases, they confuse, contaminate, and corrupt for the arguable benefits of simplicity.

    My position is to avoid them unless they have a strong use case and then manage them consistently and carefully.

    In a practice, nulls manifest several problems.

    1. There are different kinds of “missing valuesâ€, but only one kind of nulls.  For example, how do you distinguish a termination_date that is missing because nobody remembers, from one where it does not apply, from one where it hasn’t occurred yet?
    2. Nulls are not handled the same across different SQL dialects and sometimes even across different functions within the same dialect. Thus working SQL is less portable and the potential for coding errors is increased.
    3. Nulls are statistically and mathematically ambiguous.  Should a null value count in an average? If so, what value should it assume?  If not, how does one reconcile the number of rows in the calculation with the number of rows in the population.
    4. Nullable foreign keys can cause performance problems, demand the understanding and use of outer joins, and introduce hard-to-diagnose query complications.
    5. Logical operators applied to nulls are illogical.  Case in point null=null, null <> null, null < null, null > null are all false in most implementations.

    At the end of the day, both using and avoiding nulls require work-arounds and coding discipline.  Make an informed choice and do not be deceived into thinking nulls are an easy solution to a logically challenging situation.  We need a toolbox for ways to handle missing information.

    But think about it:

    1) "For example, how do you distinguish a termination_date that is missing because nobody remembers, from one where it does not apply, from one where it hasn’t occurred yet?"
    Those are not missing values. They are specific bits of information,  even if their 'value' is that there is no information relevant. A NULL is an unknown that has not even been categorized.

    2) "Nulls are not handled the same across different SQL dialects and sometimes even across different functions within the same dialect" Magic numbers suffer from the save problems. Their 'values' are only understood by hard coding within a particular application.  Future developers may not even be aware of those implications if not well documented.

    3) "Nulls are statistically and mathematically ambiguous.  Should a null value count in an average?" Same problems with magic numbers, perhaps even more so because systems often will throw an error for a misplaced NULL.

    4) Keys should never be nullable. NULLs are only for data, and only some data fields.

    5) "Logical operators applied to nulls are illogical.  "   Similarly logical operators applied to magic numbers produce skewed or invalid results.

    ...

    -- FORTRAN manual for Xerox Computers --

  • edwardwill - Friday, July 6, 2018 7:02 AM

    brad.pears - Friday, July 6, 2018 6:49 AM

    Romac - Friday, July 6, 2018 3:44 AM

    edwardwill - Friday, July 6, 2018 3:38 AM

    Stefan LG - Friday, July 6, 2018 3:34 AM

    Ah NULLS...just as controversial as spaces vs. tabs!

    I especially use it for unknown datetime columns.
    Let's say the table has a column for CreateDate and ModifiedDate, then ModifiedDate will be NULL until such time the record is actually modified by somebody.

    Yeah, but are they as controversial as pointless aliasing:

    SELECT * FROM Users U WHERE U.UserId = 12324

    LOL! But "pointless" aliasing is very very useful when using intellisense in (eg) SSMS!

    And saves me an awful lot of redundant typing when table names get a little carried away in length!!!

    See my original SQL - I get to see LOTS of this kind of pointless aliasing.  As for "redundant" typing - I'm a touch typist and I know how to use CTRL+C, CTRL+V.  The only time I use aliases is when it's mandated (some JOINS etc., or if I'm referencing the same table twice).  Neither do I type SELECT Users.GivenName, Users.FamilyName FROM Users when SELECT GivenName, FamilyName FROM Users will suffice.

    When talking about table aliases, I say use them even in single table queries.  I have written numerous such queries only to see them grow into multiple table queries.  If you get in the habit of using table aliases it just becomes second nature.  And when using them, always use them, not just where two (or more) tables have common names.  Unique column names should also be aliased since you never know when another table in a query may have a column added with that same name.

    I am also getting to the point where I hate (tough word) when people call things evil that are simple tools.  Nulls are evil, cursors are evil, magic number are evil (flash back to The Mummy 2 and things being cursed).  Any tool used inappropriately can be evil just as any tool used appropriately can be good.

  • RLilj33 - Friday, July 6, 2018 9:11 AM

    Thank you roger.plowman. I will consider that as my lesson learned for the day!

    I believe my point is still valid - there are things in life (and db work) that we ascribe values to that can add confusion instead of clarification. One example is the use of a made-up value to provide a representation of NULL. In the example of using -1, consider that QlikView/QlikSense views -1 as a TRUE value. If you were to create a Qlik application that reads these values, you may end up with many false-positive values.

    >one example is the use of made-up value to provide a representation of NULL.

    The question is about using NULL to represent a "missing" value, not about using a code to represent NULL.

     All representations are fundamentally "made up".  "Unknown" is a made up sequence of letters to represent something and is given fuzzy meaning by the English language.  -1 is similarly made up, with the advantages that it's precise meaning can/should be declared for the universe of discourse and can be language independent. 

    So, which is better for representing "unknown", whatever that may mean:
       "unknown" - an English word with commonly understood definition 
       "-1" a code for "unknown" 
        NULL - an odd database artifact permitting a column with no assigned value.

    There is simply no one-size-fits-all answer.  But...NULL means "absent a value", should be treated as no-more-or-less than that, and should be used only when your use case calls for such a thing.  For those wanting NULL to represent a special value of some kind, then you are better off to use something else.

  • Lynn Pettis - Friday, July 6, 2018 11:24 AM

    When talking about table aliases, I say use them even in single table queries.  I have written numerous such queries only to see them grow into multiple table queries.  If you get in the habit of using table aliases it just becomes second nature.  And when using them, always use them, not just where two (or more) tables have common names.  Unique column names should also be aliased since you never know when another table in a query may have a column added with that same name.

    I am also getting to the point where I hate (tough word) when people call things evil that are simple tools.  Nulls are evil, cursors are evil, magic number are evil (flash back to The Mummy 2 and things being cursed).  Any tool used inappropriately can be evil just as any tool used appropriately can be good.

    The discussion about table aliases is interesting, but off topic.

  • Dennis Q Miller - Friday, July 6, 2018 11:33 AM

    Lynn Pettis - Friday, July 6, 2018 11:24 AM

    When talking about table aliases, I say use them even in single table queries.  I have written numerous such queries only to see them grow into multiple table queries.  If you get in the habit of using table aliases it just becomes second nature.  And when using them, always use them, not just where two (or more) tables have common names.  Unique column names should also be aliased since you never know when another table in a query may have a column added with that same name.

    I am also getting to the point where I hate (tough word) when people call things evil that are simple tools.  Nulls are evil, cursors are evil, magic number are evil (flash back to The Mummy 2 and things being cursed).  Any tool used inappropriately can be evil just as any tool used appropriately can be good.

    The discussion about table aliases is interesting, but off topic.

    Typical of the forums here, just thought I'd add my 2 cents.

  • jay-h - Friday, July 6, 2018 11:16 AM

    But think about it:

    1) "For example, how do you distinguish a termination_date that is missing because nobody remembers, from one where it does not apply, from one where it hasn’t occurred yet?"
    Those are not missing values. They are specific bits of information,  even if their 'value' is that there is no information relevant. A NULL is an unknown that has not even been categorized.

    2) "Nulls are not handled the same across different SQL dialects and sometimes even across different functions within the same dialect" Magic numbers suffer from the save problems. Their 'values' are only understood by hard coding within a particular application.  Future developers may not even be aware of those implications if not well documented.

    3) "Nulls are statistically and mathematically ambiguous.  Should a null value count in an average?" Same problems with magic numbers, perhaps even more so because systems often will throw an error for a misplaced NULL.

    4) Keys should never be nullable. NULLs are only for data, and only some data fields.

    5) "Logical operators applied to nulls are illogical.  "   Similarly logical operators applied to magic numbers produce skewed or invalid results.

    Your response is welcome...only constructive dialog is intended.

    1.  I agree, so don't use NULL for them.  Alternatives are simple for text fields or codes that reference a text field in a "lookup" table.  Not so much for numbers, dates, boolean, and other data types.  The real shortcoming here is that date datatypes rarely include "unknown", "not applicable", etc. within their domains.  So, how do we store such information: NULL?

    2. I agree that "Magic Numbers" is a bad practice, but I don't think this is the case here.  The -1 is a code, declared by the designers, and resolved in a parent table where it is fully disclosed.  There is no getting around hard-coding for special cases, but I would not hard-code to the -1, I would hard code to the meaningful value to which it resolves.   x="Not Applicable" is way more self-documenting than x IS NULL (and is more likely to produce the desired result than the tempting x=NULL).  But, my real point here is that x=NULL will work with some databases, but not others--thus, the lack of portability.  In fact, if memory serves, the way that comparison works is a configuration option in some cases.

    3. "With some systems"? IMHO, any mathematical operation against a null value should raise an exception, just like divide by zero does. But that is NOT the case.  Again, the portability issue surfaces. You could take an entire statistics class about handling missing values; NULL does not solve the problem. Just to clarify, I would never use -1 to mean something besides -1 in a column that represents a number; only in a field that holds a code.

    4. We agree on this: NULL should never be used for (foreign) keys. Any database that allows NULL in a unique key should be stricken from the earth.

    5. Again, in no way do I subscribe to "magic numbers".  Please, do not use -1 to mean anything else in a column representing a number.  To make my point, I sometimes use the contrived example of a countdown timer on an explosive device.  What risk could there be to using zero to mean "disabled"?

    Of course, this begs question: what should I use instead?  What to use in a column representing a number when the number is "missing", "unknown", "not applicable", "optional", etc. Perhaps NULL or perhaps a database design that better serves the use case.

  • Dennis Q Miller - Friday, July 6, 2018 11:29 AM

    >one example is the use of made-up value to provide a representation of NULL.

    The question is about using NULL to represent a "missing" value, not about using a code to represent NULL.

     All representations are fundamentally "made up".  "Unknown" is a made up sequence of letters to represent something and is given fuzzy meaning by the English language.  -1 is similarly made up, with the advantages that it's precise meaning can/should be declared for the universe of discourse and can be language independent. 

    So, which is better for representing "unknown", whatever that may mean:
       "unknown" - an English word with commonly understood definition 
       "-1" a code for "unknown" 
        NULL - an odd database artifact permitting a column with no assigned value.

    There is simply no one-size-fits-all answer.  But...NULL means "absent a value", should be treated as no-more-or-less than that, and should be used only when your use case calls for such a thing.  For those wanting NULL to represent a special value of some kind, then you are better off to use something else.

    Null isn't just a weird database concept, it comes up plenty of times in general development where something doesn't have a value.  As it relates to databases NULL is a value that has a very specific meaning and functionality(yes that may very platform by platform) and works predictably if sometimes counter intuitively in ways that no other value would.  And unlike magic values it's functionality is consistent across different fields, a NULL in a varchar works the same way regardless of what the field is or what it's used for.

Viewing 15 posts - 31 through 45 (of 143 total)

You must be logged in to reply to this topic. Login to reply