Four Rules for NULLs

  • dbishop (4/3/2008)


    So, when you boil all this down, how do you handle things like DateTerminated, DisconnectDate (specifically DATETIME datatypes) when the event has not occurred. You know that an employee will someday be terminated, or a customer will someday terminate service, but is the use of NULL in this case sonsidered acceptable? I've seen all kinds of workarounds (NULLS, using a false date (1/1/1900) making the date VARCHAR and using ISDATE() logic...)

    What is the preferred solution?

    I've seen the same thing, particularly with DATETIME. The problem is that there is no equivalent to a zero-length string ('') for DATETIME. I might recommend using a separate column to indicate employment status instead of relying on termination date. For instance, in the future management may want to drill down into employment status to answer questions like "how many employees were fired? how many were laid off? how many quit? how many are currently employed?" You can't answer all those questions with a DATETIME alone. You might also consider breaking out date of hire and date of termination into a separate table related to the employee table. This could be the case if you are a seasonal business, for instance, and want to track total employment time of re-hires. In that case you don't even need to store a nonexistent termination date.

  • That is a good question.

    Personally, when I design database schemas where some of the data has to be date-ranged (and lots of data items in most data models must be date-ranged), I use DateFrom and DateTo columns, and I allow the DateTo to be NULLable. It is a perfect example of what I mean when I describe a NULL as representing an unnassigned value.

    In my mind I am quite certain that this is logically correct. To consider it further: One aspect of a date-ranged data item is whether or not it is current. How do we define whether or not, for example, an organisation address, or a person's employment is current? Well we simply compare the DateFrom to Today, and test DateTo for NULL or for being greater than Today (which would mean it has an assigned date at which is becomes not current). For example, in pseudocode:

    SELECT ...

    FROM ...

    WHERE (DateTo = Today OR DateTo IS NULL))

    Finally, as regards empty strings ( "" ): as has already been stated, they are definitely not the same as no value: any NULLable column of string datatype will be NULL until it has had a value.

    Kenneth Spencer

    You never know: reading my book: "All about your computer" might just tell you something you never knew!
    lulu.com/kaspencer

  • kaspencer (4/3/2008)


    That is a good question.

    Personally, when I design database schemas where some of the data has to be date-ranged (and lots of data items in most data models must be date-ranged), I use DateFrom and DateTo columns, and I allow the DateTo to be NULLable. It is a perfect example of what I mean when I describe a NULL as representing an unnassigned value.

    I don't see any problem with that interpretation, since the DateTo is not known at the present time. It will not be known until a future date, so NULL acts as a placeholder until the DateTo is known. You could potentially lose some valuable information if you ignore other relevant information that cannot be stored in a DateTime column (e.g., did the employee "quit"? was he "fired"? etc.)

    In my mind I am quite certain that this is logically correct. To consider it further: One aspect of a date-ranged data item is whether or not it is current. How do we define whether or not, for example, an organisation address, or a person's employment is current? Well we simply compare the DateFrom to Today, and test DateTo for NULL or for being greater than Today (which would mean it has an assigned date at which is becomes not current). For example, in pseudocode:

    SELECT ...

    FROM ...

    WHERE (DateTo = Today OR DateTo IS NULL))

    I would probably write it something like this (where "Today" stands in for the GETDATE(), CURRENT_TIMESTAMP function or a variable with the current date in it):

    SELECT ...

    FROM ...

    WHERE COALESCE(DateTo, Today) >= Today

    Finally, as regards empty strings ( "" ): as has already been stated, they are definitely not the same as no value: any NULLable column of string datatype will be NULL until it has had a value.

    An empty string is a known string value of zero length. There is no DateTime equivalent (although some people like to "roll their own" so to speak).

  • Great responses. Thanks a lot.

    BTW, although I mentioned I've seen the date stored as a character string and ISDATE() function used in queries, it is not a method I would use, just one of the unusual ways I've seen it handled. I also acknowledge that an empty string and NULL are completely different animals.

    I've seen some pretty strange things in the past, including tables that have every column except an IDENTITY defined as nullable. To me, just arbitrarily using them without considering the consequences is a lazy man's way out and can cause problems down the road.

  • Mike:

    You could potentially lose some valuable information if you ignore other relevant information that cannot be stored in a DateTime column (e.g., did the employee "quit"? was he "fired"? etc.)

    Yes, of course there are many other data items that would be reuired in any situation - I was cutting it down to the minimum necessary to answer the point regarding date-ranging of data.

    Thanks

    Kenneth Spencer

    You never know: reading my book: "All about your computer" might just tell you something you never knew!
    lulu.com/kaspencer

  • Mike,

    Not BIG on DBA, but the only question I have with your code:

    SELECT ...

    FROM ...

    WHERE COALESCE(DateTo, Today) >= Today

    is, if DateTo is an indexed column, wouldn't wrapping it in COALESCE() preclude using the index for retrieving values, whereas Kenneth's suggestion would use indexes if present?

  • dbishop (4/4/2008)


    Mike,

    Not BIG on DBA, but the only question I have with your code:

    SELECT ...

    FROM ...

    WHERE COALESCE(DateTo, Today) >= Today

    is, if DateTo is an indexed column, wouldn't wrapping it in COALESCE() preclude using the index for retrieving values, whereas Kenneth's suggestion would use indexes if present?

    Good point. For a large data set using COALESCE in the WHERE clause could cause an unacceptable performance impact. For the example the OP gave (employees hired and fired), I was assuming a small number of rows (which could be a bad assumption since very large organizations may have burned through a lot of employees - e.g., McDonald's). SQL Server won't necessarily ignore an index because you use COALESCE in the WHERE clause, but could generate an index scan instead of a seek. A lot of other factors play into whether SQL Server decides to use an index or not: e.g., are there any bookmark lookups (RID lookups) or is the index covering, etc. You could also UNION ALL two queries together, which probably sits somewhere in the middle performance-wise, but much lower on the readability scale.

    SELECT ...

    FROM ...

    WHERE DateTo >= Today

    UNION ALL

    SELECT ...

    FROM ...

    WHERE DateTo IS NULL;

  • This query will work regardless of ANSI settings:

    select rep

    from salesteam a

    where rep NOT in (

    select fullname from manpower

    where fullname = a.rep )

    group by rep

    The secret ingredient is how the inner select works - it throws out NULL values leaving the outer "NOT" filter to work as expected. The "group by" is optional.

    In a different SQL website, they were wrangling over the nulls too, but no answer to the original sql query.

  • hmmm... the solution provided by "bear in a box" looks very Oracle-ish. I believe MS & Oracle spend HOURS trying to figure out how to make their competitors code run like CRAP! This may be a sample of that.

    I tried to figure out where the thread was going with the original code or a subsequent post but I got lost...

    It looks like an OR clause would work just fine...

    SELECT [column list]

    FROM dbo.SomeTable

    WHERE DateTo >= @today

    OR DateTo IS NULL

    I know from the original post that it was about NULLS and some of the samples were a bit weird. For Example: NULL should never equal NULL since unknown cannot equal unknown. Also, concatenation behavior is based on the CONCAT_NULL_YIELDS_NULL setting. If it's ON 'A' + NULL yields NULL, if it's OFF then 'A' + NULL yields 'A'.

    --Paul Hunter

  • Please qualify how it can be bad when it works fine without having to do convoluted steps cursor tables, etc.

  • Maybe I went blind last night while typing in the code..... nope, no blindness. Bear, where did you see the cursor? I looked under every letter and couldn't find it.

    --Paul Hunter

  • ...it must be somewhere there within the 15 pages of posts.

    the answer i was expecting from you would have been, "because the filter will take a hit with a large table scan." i modified the code to be "where [field] is not null", the query runs faster; but this assumes one knows very field in the table has been allowed or not allowed nulls. this is not the case which is why the issue comes up during actual.

    some of the helpful posts generalize to having the "where" part remove the nulls. again, it assumes one knows...etc.

    i was trying to find an invariant way where i don't have to set on then after, set off. nor use functions or make "is not null" part of every query. the query originates from a more difficult query where i needed two to three values to filter out one transaction among many.

  • bear in a box (6/24/2009)


    ...it must be somewhere there within the 15 pages of posts.

    the answer i was expecting from you would have been, "because the filter will take a hit with a large table scan." i modified the code to be "where [field] is not null", the query runs faster; but this assumes one knows very field in the table has been allowed or not allowed nulls. this is not the case which is why the issue comes up during actual.

    some of the helpful posts generalize to having the "where" part remove the nulls. again, it assumes one knows...etc.

    i was trying to find an invariant way where i don't have to set on then after, set off. nor use functions or make "is not null" part of every query. the query originates from a more difficult query where i needed two to three values to filter out one transaction among many.

    How would you NOT know if a column in a table allowed nulls or not? You can look at the table definition to see if it does, or you can query the table with specific queries to determine if a column you are selecting contains any nulls or not.

    The best bet, of course, is through documentation that tells you what you need to know.

  • Hello and happy new year everyone.

    I've looked in the thread and not managed to see an answer to this. I use SQL server.

    If I have a Customers table, with a nullable field Country, and want to find all Customers except ones in Russia:

    SELECT * FROM CUSTOMERS WHERE NOT Country = 'Russia'

    will not show me customers with a null country. And that seems rotten to me, the query is simple and I would expect to see all except customers with the country specified Russia.

    I could use ansi_nulls off, or have to do

    SELECT * FROM CUSTOMERS WHERE NOT ISNULL(Country, '') = 'Russia'

    both of these seem ugly, as if you must walk on eggshells if a null may be present. Is there a better way?

    All the best,

    Greg

  • Greg Martin-419640 (1/7/2010)


    Hello and happy new year everyone.

    I've looked in the thread and not managed to see an answer to this. I use SQL server.

    If I have a Customers table, with a nullable field Country, and want to find all Customers except ones in Russia:

    SELECT * FROM CUSTOMERS WHERE NOT Country = 'Russia'

    will not show me customers with a null country. And that seems rotten to me, the query is simple and I would expect to see all except customers with the country specified Russia.

    I could use ansi_nulls off, or have to do

    SELECT * FROM CUSTOMERS WHERE NOT ISNULL(Country, '') = 'Russia'

    both of these seem ugly, as if you must walk on eggshells if a null may be present. Is there a better way?

    All the best,

    Greg

    Or this

    SELECT * FROM CUSTOMERS WHERE NOT Country = 'Russia' OR Country IS NULL

Viewing 15 posts - 136 through 150 (of 152 total)

You must be logged in to reply to this topic. Login to reply