Who Likes NULL?

  • chrisbray - Tuesday, July 10, 2018 2:09 AM

    edwardwill - Tuesday, July 10, 2018 1:26 AM

    chrisbray - Monday, July 9, 2018 10:49 PM

    I agree that a bit column for IsActive is a bad idea.  However, the opposite - a bit column for Inactive - works well if there is no EndDate or anything similar to indicate whether or not the record should be used and you want to retain the information but effectively disable it. Addresses are a good example where this is probably the best option, as are bank details and contacts, where you are not storing any form of active date ranges...  Having a flag to check *before* you check a date column is madness in my view - why not just check the date?

    There is absolutely no difference between a bit column for IsActive and a bit column for Inactive.

    Semantically there is a huge difference!  The requirement to activate something that logically should have a default state of active (after all, why would you add an inactive address for example) is nonsensical at least at the UI end.  I agree that the database doesn't care, but since logic dictates that the UI content reflect the database content and vice-versa then the naming is pretty critical to understanding by all concerned.

    Re: Active/Inactive - For myself and our team, we have found that such a column is best named for the non-default state. So, if a row is created as Active by default, the column is best called [Inactive] and defaulted to FALSE/0. We have [disabled], [excluded] etc in different scenarios for the same thing. Experience has indicated it to be the most logical way of thinking.

    As for the NULL debate, surely, if there is an [end_date] column on a row and the row is still current, then [end_date] IS unknown, therefore NULL is the logical default "value". If there is a problem with an application being able to handle that, then perhaps you could build a view or stored procedure that the application can use. That's the way I've sometimes done it - it allows you to hide / protect / update columns the application doesn't need to know about or alter. However, if it's a legacy database or one you do not have such control over, you have to live with the limitations.
    (I'm writing as a software engineer involved in database design and development, and application / web development; so I have experience of both sides)

  • edwardwill - Tuesday, July 10, 2018 2:21 AM

    chrisbray - Tuesday, July 10, 2018 2:09 AM

    Semantically there is a huge difference!  The requirement to activate something that logically should have a default state of active (after all, why would you add an inactive address for example) is nonsensical at least at the UI end.  I agree that the database doesn't care, but since logic dictates that the UI content reflect the database content and vice-versa then the naming is pretty critical to understanding by all concerned.

    As a dev rather than a DBA I think your argument is entirely academic, and a straw poll of the three devs in my team who aren't currently on holiday supports my view.  Consider the LINQ to surface the data:

    var userDetails = _userService.Repository<UserDetails>().Query(x => x.IsActive == true).Select().FirstOrDefault();
    as opposed to
    var userDetails = _userService.Repository<UserDetails>().Query(x => x.InActive == false).Select().FirstOrDefault();

    I know which I'd prefer.

    I suppose that you could say reasonably say that is purely academic, if you consider only the coding aspect. However, in the real world we need to consider people... which is why I used the word semantically.

    Firstly, what you describe is not in the UI - I am talking about what the user sees.  When creating a new whatever it is - let's continue to use Address as the example - why would the user be required to check a box on the UI to activate the new address?  Far more logical and understandable for the user that a new address is active as soon as they create it, and that they check the box for Inactive when the address eventually becomes surplus to requirements in the active list.  I think we are all going to agree that users are idiots (and I include myself in that when I am using other people's software) and that they need to be guided using the simplest possible methodology with the least possible number of data entry requirements to achieve a valid object.  If you have the requirement to check a box for Active in order to make something that should already be active become active, what else are you going have them do before the object is valid?  What if there are ten or twenty different flags in a settings object?  Are you going to set all of those to the reverse of the required value and require the user to set them?

    Now you could get round all that by reversing the logic on the UI compared to the database,  but that way surely lies madness?

    If we are  going to quibble, using your code I would prefer !x.Inactive rather than using an == boolean (again possibly academic in that there is no difference in the result and the language supports it), and in any case I would normally be doing this via TSQL (after all this is a SQL Server forum) so the filter would actually be 'Inactive = 0'. 🙂

  • Romac - Tuesday, July 10, 2018 2:34 AM

    chrisbray - Tuesday, July 10, 2018 2:09 AM

    edwardwill - Tuesday, July 10, 2018 1:26 AM

    chrisbray - Monday, July 9, 2018 10:49 PM

    I agree that a bit column for IsActive is a bad idea.  However, the opposite - a bit column for Inactive - works well if there is no EndDate or anything similar to indicate whether or not the record should be used and you want to retain the information but effectively disable it. Addresses are a good example where this is probably the best option, as are bank details and contacts, where you are not storing any form of active date ranges...  Having a flag to check *before* you check a date column is madness in my view - why not just check the date?

    There is absolutely no difference between a bit column for IsActive and a bit column for Inactive.

    Semantically there is a huge difference!  The requirement to activate something that logically should have a default state of active (after all, why would you add an inactive address for example) is nonsensical at least at the UI end.  I agree that the database doesn't care, but since logic dictates that the UI content reflect the database content and vice-versa then the naming is pretty critical to understanding by all concerned.

    Re: Active/Inactive - For myself and our team, we have found that such a column is best named for the non-default state. So, if a row is created as Active by default, the column is best called [Inactive] and defaulted to FALSE/0. We have [disabled], [excluded] etc in different scenarios for the same thing. Experience has indicated it to be the most logical way of thinking.

    As for the NULL debate, surely, if there is an [end_date] column on a row and the row is still current, then [end_date] IS unknown, therefore NULL is the logical default "value". If there is a problem with an application being able to handle that, then perhaps you could build a view or stored procedure that the application can use. That's the way I've sometimes done it - it allows you to hide / protect / update columns the application doesn't need to know about or alter. However, if it's a legacy database or one you do not have such control over, you have to live with the limitations.
    (I'm writing as a software engineer involved in database design and development, and application / web development; so I have experience of both sides)

    I agree wholeheartedly - that was the point I was making (or perhaps trying to make 🙂)

  • Romac - Tuesday, July 10, 2018 2:34 AM

    chrisbray - Tuesday, July 10, 2018 2:09 AM

    edwardwill - Tuesday, July 10, 2018 1:26 AM

    chrisbray - Monday, July 9, 2018 10:49 PM

    I agree that a bit column for IsActive is a bad idea.  However, the opposite - a bit column for Inactive - works well if there is no EndDate or anything similar to indicate whether or not the record should be used and you want to retain the information but effectively disable it. Addresses are a good example where this is probably the best option, as are bank details and contacts, where you are not storing any form of active date ranges...  Having a flag to check *before* you check a date column is madness in my view - why not just check the date?

    There is absolutely no difference between a bit column for IsActive and a bit column for Inactive.

    Semantically there is a huge difference!  The requirement to activate something that logically should have a default state of active (after all, why would you add an inactive address for example) is nonsensical at least at the UI end.  I agree that the database doesn't care, but since logic dictates that the UI content reflect the database content and vice-versa then the naming is pretty critical to understanding by all concerned.

    Re: Active/Inactive - For myself and our team, we have found that such a column is best named for the non-default state. So, if a row is created as Active by default, the column is best called [Inactive] and defaulted to FALSE/0. We have [disabled], [excluded] etc in different scenarios for the same thing. Experience has indicated it to be the most logical way of thinking.

    As for the NULL debate, surely, if there is an [end_date] column on a row and the row is still current, then [end_date] IS unknown, therefore NULL is the logical default "value". If there is a problem with an application being able to handle that, then perhaps you could build a view or stored procedure that the application can use. That's the way I've sometimes done it - it allows you to hide / protect / update columns the application doesn't need to know about or alter. However, if it's a legacy database or one you do not have such control over, you have to live with the limitations.
    (I'm writing as a software engineer involved in database design and development, and application / web development; so I have experience of both sides)

    Just a different opinion... I've found that negative naming (Inactive) is confusing to people in a hurry an suggest always using "IsActive" instead of naming something based on the most common condition.,  As for using NULL for an EndDate, I avoid it because code is much easier to write (and usually better performing) if it's always populated using the "magic number" of 9999-01-01.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • edwardwill - Friday, July 6, 2018 4:08 AM

    Romac - Friday, July 6, 2018 4:02 AM

    Haha, yes! :hehe: Jonathan Swift wrote a story about people that went to war over this (well, about which way up to put an egg in an egg-cup) :crazy:

    Completely off-topic, but I was reading about a trend on dating sites where people examine photos for hints as to the person they might be contacting.  One woman received a number of enquiries regarding the way she'd loaded her toilet paper (apparently it's supposed to be over ... or is it under?  I guess anyone in possession of a life wouldn't really give a stuff!)

    I'd be more inclined to wonder why in god's name she had a picture of her bog on a dating site.  Dunno - I don't use dating sites myself - perhaps it's a thing in those circles, seems odd to me though.  Maybe I'm just being English about it all.

    I'm a DBA.
    I'm not paid to solve problems. I'm paid to prevent them.

  • Jeff Moden - Tuesday, July 10, 2018 7:02 AM

    Romac - Tuesday, July 10, 2018 2:34 AM

    chrisbray - Tuesday, July 10, 2018 2:09 AM

    edwardwill - Tuesday, July 10, 2018 1:26 AM

    chrisbray - Monday, July 9, 2018 10:49 PM

    I agree that a bit column for IsActive is a bad idea.  However, the opposite - a bit column for Inactive - works well if there is no EndDate or anything similar to indicate whether or not the record should be used and you want to retain the information but effectively disable it. Addresses are a good example where this is probably the best option, as are bank details and contacts, where you are not storing any form of active date ranges...  Having a flag to check *before* you check a date column is madness in my view - why not just check the date?

    There is absolutely no difference between a bit column for IsActive and a bit column for Inactive.

    Semantically there is a huge difference!  The requirement to activate something that logically should have a default state of active (after all, why would you add an inactive address for example) is nonsensical at least at the UI end.  I agree that the database doesn't care, but since logic dictates that the UI content reflect the database content and vice-versa then the naming is pretty critical to understanding by all concerned.

    Re: Active/Inactive - For myself and our team, we have found that such a column is best named for the non-default state. So, if a row is created as Active by default, the column is best called [Inactive] and defaulted to FALSE/0. We have [disabled], [excluded] etc in different scenarios for the same thing. Experience has indicated it to be the most logical way of thinking.

    As for the NULL debate, surely, if there is an [end_date] column on a row and the row is still current, then [end_date] IS unknown, therefore NULL is the logical default "value". If there is a problem with an application being able to handle that, then perhaps you could build a view or stored procedure that the application can use. That's the way I've sometimes done it - it allows you to hide / protect / update columns the application doesn't need to know about or alter. However, if it's a legacy database or one you do not have such control over, you have to live with the limitations.
    (I'm writing as a software engineer involved in database design and development, and application / web development; so I have experience of both sides)

    Just a different opinion... I've found that negative naming (Inactive) is confusing to people in a hurry an suggest always using "IsActive" instead of naming something based on the most common condition.,  As for using NULL for an EndDate, I avoid it because code is much easier to write (and usually better performing) if it's always populated using the "magic number" of 9999-01-01.

    See, I use Inactive because it is rarer than active, it's a "deleted" record and thus it becomes very easy to exclude with a simple "Not Inactive" or in SQL Inactive = 0. If you use nothing else to denote inactive records and default the column to false it remains out of sight-out of mind until you need it.

  • roger.plowman - Tuesday, July 10, 2018 8:13 AM

    Jeff Moden - Tuesday, July 10, 2018 7:02 AM

    Romac - Tuesday, July 10, 2018 2:34 AM

    chrisbray - Tuesday, July 10, 2018 2:09 AM

    edwardwill - Tuesday, July 10, 2018 1:26 AM

    chrisbray - Monday, July 9, 2018 10:49 PM

    I agree that a bit column for IsActive is a bad idea.  However, the opposite - a bit column for Inactive - works well if there is no EndDate or anything similar to indicate whether or not the record should be used and you want to retain the information but effectively disable it. Addresses are a good example where this is probably the best option, as are bank details and contacts, where you are not storing any form of active date ranges...  Having a flag to check *before* you check a date column is madness in my view - why not just check the date?

    There is absolutely no difference between a bit column for IsActive and a bit column for Inactive.

    Semantically there is a huge difference!  The requirement to activate something that logically should have a default state of active (after all, why would you add an inactive address for example) is nonsensical at least at the UI end.  I agree that the database doesn't care, but since logic dictates that the UI content reflect the database content and vice-versa then the naming is pretty critical to understanding by all concerned.

    Re: Active/Inactive - For myself and our team, we have found that such a column is best named for the non-default state. So, if a row is created as Active by default, the column is best called [Inactive] and defaulted to FALSE/0. We have [disabled], [excluded] etc in different scenarios for the same thing. Experience has indicated it to be the most logical way of thinking.

    As for the NULL debate, surely, if there is an [end_date] column on a row and the row is still current, then [end_date] IS unknown, therefore NULL is the logical default "value". If there is a problem with an application being able to handle that, then perhaps you could build a view or stored procedure that the application can use. That's the way I've sometimes done it - it allows you to hide / protect / update columns the application doesn't need to know about or alter. However, if it's a legacy database or one you do not have such control over, you have to live with the limitations.
    (I'm writing as a software engineer involved in database design and development, and application / web development; so I have experience of both sides)

    Just a different opinion... I've found that negative naming (Inactive) is confusing to people in a hurry an suggest always using "IsActive" instead of naming something based on the most common condition.,  As for using NULL for an EndDate, I avoid it because code is much easier to write (and usually better performing) if it's always populated using the "magic number" of 9999-01-01.

    See, I use Inactive because it is rarer than active, it's a "deleted" record and thus it becomes very easy to exclude with a simple "Not Inactive" or in SQL Inactive = 0. If you use nothing else to denote inactive records and default the column to false it remains out of sight-out of mind until you need it.

    Yes!  My point exactly.  It is all about what the user would expect, and find easiest to use.  Experience shows that they would not expect to have to activate something that should be active anyway.  If you want to stop something working you disable it, you do not normally get something that doesn't work and enable it, do you???

    Naming is always incredibly important, possibly the most important part of creating an application or indeed a database.  With a bad name, nothing makes sense!

  • To be clear, I don't like NULL for string columns. Often I think we can better describe this as N/A, Unknown, Unassigned, etc. For numbers, it makes some sense, as NULL or a magic number are both fraught with issues and need some COALSESCE-type handling. Neither is better, but with NULL I get one magic number we agree to handle. If we use -1, 9999999, or something else, then I might have a bunch that need to be documented.

    Easy to say document and ensure everyone knows. In an organization of any size,with turnover, this is incredibly hard. Maybe the MDM people have something that helps, but just turning the ship to implement MDM and ensure everyone uses it, including the random contractors brought on by some department that needed some app built, or the person doing side work for their boss knows about the MDM, or magic values, is hard. Certainly every Excel user that gets an export isn't going to know this.

    For dates, I don't see an alternative. A separate column is an issue. Whether it's NULL or 9999, you still need handling, so why not null?

  • Steve Jones - SSC Editor - Tuesday, July 10, 2018 8:40 AM

    To be clear, I don't like NULL for string columns. Often I think we can better describe this as N/A, Unknown, Unassigned, etc. For numbers, it makes some sense, as NULL or a magic number are both fraught with issues and need some COALSESCE-type handling. Neither is better, but with NULL I get one magic number we agree to handle. If we use -1, 9999999, or something else, then I might have a bunch that need to be documented.

    Easy to say document and ensure everyone knows. In an organization of any size,with turnover, this is incredibly hard. Maybe the MDM people have something that helps, but just turning the ship to implement MDM and ensure everyone uses it, including the random contractors brought on by some department that needed some app built, or the person doing side work for their boss knows about the MDM, or magic values, is hard. Certainly every Excel user that gets an export isn't going to know this.

    For dates, I don't see an alternative. A separate column is an issue. Whether it's NULL or 9999, you still need handling, so why not null?

    Because Null only gives you one value, a value that doesn't tell you why it's null. In our company we have 3 values that replace null. The one closest in meaning to null is probably TBD (to be determined). It means the value is unknown at present, but is applicable and hasn't been verified as impossible to find out.

    N/A (not applicable) means not only is the data missing, it's missing deliberately and will never be set because it doesn't apply to this record. For example, we have a Site table which contains an ATM field. However Sites apply to both ATMs (for replenishment/servicing) as well as places we perform courier work for (such as credit unions) that do not have an ATM. Thus the ATM field for the site is N/A. Null would be ambiguous in this context, is it null because it's never been filled in (somebody forgot) or because there is no ATM?

    Finally, UNK (verified unknown) indicates that the field is empty because the data has been lost and is unrecoverable. One good example of this would be the purchase date for a piece of equipment so old we either lost or got rid of the purchase order.

    Null in these cases doesn't work. It doesn't tell you why the field is empty, only that it is. For us, at least, that's not sufficient.

    So we use magic numbers in domains that have naturally unused spaces. For example our date and date/time fields use dates of 01/01/1900 and before since nothing we deal with existed before 1900--including employees! :hehe:

    For us the use of null and its magic number replacements are so critical our development wiki (everyone has a developer's wiki, right? :)) lists it as one of the first links on the development guidelines overview, along with the proper use of default values.

  • Steve Jones - SSC Editor - Tuesday, July 10, 2018 8:40 AM

    For dates, I don't see an alternative. A separate column is an issue. Whether it's NULL or 9999, you still need handling, so why not null?

    Because it causes extra code to be written to handle lookups for things that are active and that extra code frequently results in performance problems.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, July 10, 2018 9:18 AM

    Steve Jones - SSC Editor - Tuesday, July 10, 2018 8:40 AM

    For dates, I don't see an alternative. A separate column is an issue. Whether it's NULL or 9999, you still need handling, so why not null?

    Because it causes extra code to be written to handle lookups for things that are active and that extra code frequently results in performance problems.

    /shrug you also have to write extra code for magic numbers.

  • ZZartin - Tuesday, July 10, 2018 9:24 AM

    Jeff Moden - Tuesday, July 10, 2018 9:18 AM

    Steve Jones - SSC Editor - Tuesday, July 10, 2018 8:40 AM

    For dates, I don't see an alternative. A separate column is an issue. Whether it's NULL or 9999, you still need handling, so why not null?

    Because it causes extra code to be written to handle lookups for things that are active and that extra code frequently results in performance problems.

    /shrug you also have to write extra code for magic numbers.

    Not so much as nulls, however. Depending on your magic value it might be as simple as using a BETWEEN clause.

  • Jeff Moden - Tuesday, July 10, 2018 9:18 AM

    Steve Jones - SSC Editor - Tuesday, July 10, 2018 8:40 AM

    For dates, I don't see an alternative. A separate column is an issue. Whether it's NULL or 9999, you still need handling, so why not null?

    Because it causes extra code to be written to handle lookups for things that are active and that extra code frequently results in performance problems.

    So, what is the difference between:
    StartDate >= '2018-01-01' AND EndDate IS NULL
    and
    StartDate >= '2018-01-01 AND EndDate < '9999-01-01'

  • erikb 90350 - Friday, July 6, 2018 5:43 AM

    From a programming perspective (i.e. the typical consumers of the database data), NULLs are a massive source of defects. NULL generally doesn't play well with code so it has to be trapped all over the place, leading to fragile code and insidious bugs.  There's a reason that Tony Hoare referred to NULL as a billion dollar mistake.

    Yet, ironically, one of the commonest design _ups I see is developers creating rubbish where just about everything - including elements which are absolutely required in any remotely sane business case - can be / defaults to null.

    I'm a DBA.
    I'm not paid to solve problems. I'm paid to prevent them.

  • roger.plowman - Tuesday, July 10, 2018 9:15 AM

    Because Null only gives you one value, a value that doesn't tell you why it's null. In our company we have 3 values that replace null. The one closest in meaning to null is probably TBD (to be determined). It means the value is unknown at present, but is applicable and hasn't been verified as impossible to find out.

    N/A (not applicable) means not only is the data missing, it's missing deliberately and will never be set because it doesn't apply to this record. For example, we have a Site table which contains an ATM field. However Sites apply to both ATMs (for replenishment/servicing) as well as places we perform courier work for (such as credit unions) that do not have an ATM. Thus the ATM field for the site is N/A. Null would be ambiguous in this context, is it null because it's never been filled in (somebody forgot) or because there is no ATM?

    Finally, UNK (verified unknown) indicates that the field is empty because the data has been lost and is unrecoverable. One good example of this would be the purchase date for a piece of equipment so old we either lost or got rid of the purchase order.

    For us the use of null and its magic number replacements are so critical our development wiki (everyone has a developer's wiki, right? :)) lists it as one of the first links on the development guidelines overview, along with the proper use of default values.

    To be clear, I like your approach, Roger, but if you haven't been doing that, it can be a lot of work to implement across an application. I'm still not sold 1900 is better, but I like the use of this for FKs.

Viewing 15 posts - 76 through 90 (of 143 total)

You must be logged in to reply to this topic. Login to reply