NULL Equals NULL?

  • Thanks for taking the time to give feedback Mike (and Jack), it helps to have a better idea of whats going on behind the scenes.

    So I gather that the upshot of it all in my case that it would have been ideal for me only to have to worry about checking for Null and not space(0) but since that isn't the case its best to avoiding using any function in my WHERE clause.

    Most projects I've worked on before have shied away from storing NULLs (largely in order to fit in with the design of other custom built business objects), instead defaulting them with 0s or space(0)s and have implemented workarounds to handle unknowns (such as using -999 to mean unknown!). It looks like this was a case of reinventing the wheel by making one that was badly misshaped and buckled. I can see the value of properly used Nulls now.

  • You got it. Think of it like this - if you use a function in a WHERE clause or in an ON clause of a JOIN, SQL is forced to look at, and perform the function on, every single row just so it can evaluate the predicate. If you use IS NULL or other comparison operators without a function, SQL can take advantage of indexes to speed things up. It won't necessarily need to look at every single row, and that's where your queries can really benefit from increased efficiency.

    And you're right about those custom indicators as well. I've seen projects where indicators like "-99", "-9" and "-1" were used, and then *real data* came through with "-99", "-9" and "-1" in it. Needless to say the result was a mess of real data mixed with custom indicators, and no way to tell the two apart.

    I don't think NULL is a bad thing, it's just misused and abused, especially when you're coming from another programming language that has a different concept of what Null is (such as C++'s NULL, C#'s null, or VB's Nothing, which are all completely different from SQL NULL).

  • Use NULL, don't use NULL, use an indicator, don't use an inidcator. Seems like us poor developers never get a break! I know it is possible to design a database structure in such a way that you do not need to use nulls, but sometimes we don't always get what we want, and have to work within the constraints of what is already available.

    Probably 99.999% of HR employee tables have a hire date and termination date column. If the employee is still an active employee and has not yet been terminated, what do you put in the termination date column? Null seems to make sense. 12/31/9999 might also make sense. No one in our lifetime (and possibly the lifetime of planet Earth) will ever see that date.

    What if you are a medical facility and need to enter a patient's date of birth. Everybody has one, but if the patient comes in unconcious, unresponsive and without ID, you still need to enter what information is available. Does using NULL here make sense, and if not, what are the alternatives? Again, you could go back to database design, and say if it was designed properly, that would not be a problem. If every column that might not have a value was relegated to its own table, chances are you'd end up with a database structure so complex (along with the resulting queries), that it would present a nightmare to the 'typical' database developer. I think dates present their own unique challenges when trying to represent an unknown value and not allowing NULL. Lack of an email address or middle initial can easily be represented by SPACE(0). What about Age (assuming you don't have DOB and it is being calculated)? Zero (0) is a valid age for a newborn, so you can't use that when the age is unknown. Do you really want to allow the user to enter a negative age (e.g. -1 indicates unknown)? I don't.

    I have not yet been convinced that I should never use NULLs. I want my database design to be easy enough for even an somewhat inexperienced developer to know what is going on, without needing to perform 25 joins to get all the data. However, I do believe that NULLs are greatly overused in many situations. Just my two-cents worth.

  • That brings up the Y10K problem, and Joe Celko's descendants will be cursing your name for not addressing it in the 21st century 🙂

  • I was going to leave a nice message about what a wonderful article this was and then instead spent a LONG time reading all of the posts! WOW 😀

    Anyhow, I think that there is a need to understand how NULLs work and why they work the way they do because many of us, even if we don't like it, are working with databases (reporting, extracting data, populating, etc) that have NULL values.

    In my opinion I think NULLs are useful if not abused and well understood.

    Whilst reading through and seeing the claims about being able to design the DB without needing NULLs I knew the solution that would be given - effectively splitting the "optional"/NULLable columns off into their own table with a 1:0 cardinality.

    As others have mentioned, this would become unwieldy to query. Thus in this case NULLs are a convenient shorthand and increase the ease of comprehension for any DBAs, developers or report writers.

    The other approach is to resort to very ugly and cumbersome name+value tables where, instead of NULLs in a normal wide table the lack of a name/value for a given key implies the data is missing. For example, If we had rows

    Patient=45, Name=Surname, Value = 'Smith'

    Patient=45, Name=Firstname, Value = 'John'

    then you could assume that there's no DOB. I think the NULL's a lot easier to comprehend and use.

    The way NULLs are summed is well documented. As Joe Celko finally pointed out (I was hoping someone would make the point), you do get a warning if NULLs are excluded. Consider calculating the average of a set of values. You clearly do not want the unknown (read "NULL") values affecting the average by being treated as some sort of magic number whether that be 0 or MaxInt.

    Was also hoping Steve would read the thread and restore a bit of order - as usual he was on top of things 😀 I guess the reason why things got so heated so quickly was because of the tone immediately taken by someone who was brand new to the forums (33 points in the end) vs someone who had >700 points, took the time to write a well-written article (certainly it was at least above average in use of grammar) that would help take the mystery out of some behaviour that trips up many DBAs.

    Finally, if you wish for SQL Server to take advantage of indices on your tables, don't do something like

    where isNull(my_column, '') = '' because SQL Server has to evaluate that function on every single row in your table.

    Instead, as others have said after all the muck, specify exactly what you mean as in

    where my_column is null or my_column = ''

    😀 This is a great community - very rare to see flames!

  • Whenever you compare NULL you should always use 'IS NULL' rather than '= NULL'

    e.g.

    [font="Courier New"]SET ANSI_NULLS ON

    DECLARE @val CHAR(4)

    SET @val = NULL

    SET ANSI_NULLS ON

    If @val = NULL

    PRINT 'TRUE'

    ELSE IF NOT(@val = NULL)

    PRINT 'FALSE'

    ELSE

    PRINT 'UNKNOWN'[/font]

    Will return 'UNKNOWN'

    but

    [font="Courier New"]SET ANSI_NULLS ON

    DECLARE @val CHAR(4)

    SET @val = NULL

    SET ANSI_NULLS ON

    If @val IS NULL

    PRINT 'TRUE'

    ELSE IF NOT(@val IS NULL)

    PRINT 'FALSE'

    ELSE

    PRINT 'UNKNOWN'[/font]

    Will return 'TRUE'

  • this discussion was remarkable to read. i thought this article was interesting that i found by following some of the links cimode referenced...a lot of similarities to the "exchange" that occured here.

    http://www.dbdebunk.com/page/page/861446.htm

    [font="Arial Narrow"]bc[/font]

  • Hi,

    I am having problems with my queries which I definitely think is because of null values. This is what I want to do:

    Find out a list of phone numbers that are in one table but not in the other:

    I wrote something like this:

    select phonenumber

    from ttemp

    where phonenumber

    not in

    (

    select ('7'+phonenumber)

    from tpphone

    where phonecountrycode = '64'

    and phoneareacode = '07'

    )

    But it didnt return any records.

    Now when I try the same as:

    select phonenumber

    from ttemp

    where phonenumber

    in

    (

    select ('7'+phonenumber)

    from tpphone

    where phonecountrycode = '64'

    and phoneareacode = '07'

    )

    It returned some records.

    The phonenumber column are permitted to have null values.

    What could be the possible explanation for this behaviour 🙁

  • donnelcyril (11/9/2008)


    Hi,

    I am having problems with my queries which I definitely think is because of null values. This is what I want to do:

    Find out a list of phone numbers that are in one table but not in the other:

    If you have SQL Server 2005 or greater you can use 'EXCEPT':

    SELECT phonenumber

    FROM ttemp

    EXCEPT

    SELECT phonenumber

    FROM tpphone

    WHERE phonecountrycode = '64'

    AND phoneareacode = '07'

  • donnelcyril (11/9/2008)


    The phonenumber column are permitted to have null values.

    What could be the possible explanation for this behaviour 🙁

    Hi donnelcyril,

    You already gave the reply - the null values are the explanation.

    A NOT IN can only evaluate as True if none of the rows in the subquery match the value in the outer query. If one or more rows in the subquery has a NULL, then it is Unknown to the database whether that value is a match, and as such the entire NOT IN can result in False (if an equal value exists) or Unknown (if no known equal value exists - remember it's not known if the NULL values are equal or not).

    To work around this (for some people very unexpected) behaviour, avoid NOT IN and use NOT EXISTS instead.

    More information:

    * What is null: http://sqlblog.com/blogs/hugo_kornelis/archive/2007/07/06/null-ndash-the-database-rsquo-s-black-hole.aspx

    * Three-valued logic: http://sqlblog.com/blogs/hugo_kornelis/archive/2007/07/17/the-logic-of-three-valued-logic.aspx

    * Unexpected consequences of null: http://sqlblog.com/blogs/hugo_kornelis/archive/2007/09/22/dr-unknown-or-how-i-learned-to-stop-worrying-and-love-the-null.aspx

    * Functions and expressions for proper null handling: http://sqlblog.com/blogs/hugo_kornelis/archive/2007/09/30/what-if-null-if-null-is-null-null-null-is-null.aspx


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hi,

    That really explains why my query werent pulling out any records...

    Well I was able to solve the problem by modifying it as below:

    select phonenumber

    from ttemp

    where phonenumber

    not in

    (

    select '7'+phonenumber

    from tpphone

    where phonecountrycode = '64'

    and phoneareacode = '07'

    and phonenumber is not null

    )

    However the query was taking a long time to execute and I had to pull my outer query into a temporary table to get things done.

  • I haven't seen mention of this important NULL topic anywhere:

    select 1

    where 1 = 1 or datediff(hour, null, getdate()) > 4

    TSQL seems to work like this:

    where True OR NULL = True

    where False OR NULL = False

    where True AND NULL = False

    where False AND NULL = False

    which seems good to me

    but I still think this syntax would be useful:

    where coalesce(datediff(hour, [field], getdate()) > 4, 0)

    letting you coalesce a boolean without using dummy values

  • DataDog (1/13/2010)


    I haven't seen mention of this important NULL topic anywhere:

    select 1

    where 1 = 1 or datediff(hour, null, getdate()) > 4

    TSQL seems to work like this:

    where True OR NULL = True

    where False OR NULL = False

    where True AND NULL = False

    where False AND NULL = False

    which seems good to me

    but I still think this syntax would be useful:

    where coalesce(datediff(hour, [field], getdate()) > 4, 0)

    letting you coalesce a boolean without using dummy values

    Actually, this:

    datediff(hour, null, getdate()) > 4

    Works out to this:

    NULL > 4

    Which returns 'UNKNOWN'. So your truth table entry for:

    where 1 = 1 or datediff(hour, null, getdate()) > 4

    Is actually:

    True OR Unknown = True

    That's covered fairly well in a lot of places--a lot of articles on this site, for instance or BOL.

    Thanks

    Mike C

Viewing 13 posts - 106 through 117 (of 117 total)

You must be logged in to reply to this topic. Login to reply