Understanding NULL

  • muzahm

    SSC Enthusiast

    Points: 169

    Comments posted to this topic are about the item Understanding NULL

  • Tao Klerks

    SSCarpal Tunnel

    Points: 4297

    A slightly easier way to explain might be the fully-written-out equivalents of the IN clauses.

    I've taken the liberty of "paraphrasing" T-SQL Syntax a little here, as there is no way of representing boolean values (the results of boolean operations) in T-SQL (or any SQL?) that I know of.

    IF(NULL IN (34, 35, NULL))PRINT 'TRUE' =>

    IF(NULL = 34 OR NULL = 35 OR NULL = NULL) PRINT 'TRUE' =>

    IF(UNKNOWN OR UNKNOWN OR UNKNOWN) PRINT 'TRUE' =>

    IF(UNKNOWN) PRINT 'TRUE'

    and

    IF(34 NOT IN (35, 36, NULL))PRINT 'FALSE' =>

    IF(NOT (34 = 35 OR 34 = 36 OR 34 = NULL)) PRINT 'FALSE' =>

    IF(NOT (FALSE OR FALSE OR UNKNOWN)) PRINT 'FALSE' =>

    IF(NOT (UNKNOWN)) PRINT 'FALSE'

    Because we all know(? 🙂 ) that "False OR UNKNOWN" evaluates to UNKNOWN, whereas "True OR UNKNOWN" evaluates to True...

    There's some great articles on SQLServerCentral.com about Nulls, eg: http://www.sqlservercentral.com/articles/Advanced+Querying/2829/

    Now if you REALLY want to have some fun, turn off the ANSI_NULLS option:

    SET ANSI_NULLS OFF

    All of a sudden three-valued logic becomes two-valued, and you have the equivalent of:

    IF(NULL IN (34, 35, NULL))PRINT 'TRUE' =>

    IF(NULL = 34 OR NULL = 35 OR NULL = NULL) PRINT 'TRUE' =>

    IF(FALSE OR FALSE OR TRUE) PRINT 'TRUE' =>

    IF(TRUE) PRINT 'TRUE'

    and

    IF(34 NOT IN (35, 36, NULL))PRINT 'FALSE' =>

    IF(NOT (34 = 35 OR 34 = 36 OR 34 = NULL)) PRINT 'FALSE' =>

    IF(NOT (FALSE OR FALSE OR FALSE)) PRINT 'FALSE' =>

    IF(NOT (FALSE)) PRINT 'FALSE'

    And now everything prints as you might have expected if you came to SQL Server before 2000, which is when ANSI_NULLS became default...

    (and incidentally, that's a small issue in the question, which assumed that ANSI_NULLS was off - a reasonable assumption in most environments, but still a rich source of horrible confusion when it is not)

    http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.

  • archie flockhart

    SSCrazy

    Points: 2339

    Good question, and good comment to explain the result and the fact that it depends on ANSI_NULLS setting.

    It got me thinking about the other IN / NOT IN syntax as well:

    Suppose I want a list of all the employees in NORTHWIND database who have other people reporting to them. I coud do it like this:

    select * from employees where employeeid IN (select reportsto from employees)

    and I get 2 records: employee 2 and employee 5.

    Suppose I want a list of all the employees in NORTHWIND database who DON'T have anyone reporting to them. If I try it using the apparently equivalent code ...

    select * from employees where employeeid NOT IN (select reportsto from employees)

    ... I get no records returned.

    The reason is that there is one employee with a NULL entry in "reportsto", which makes the NOT IN comparison translate to (employeeid <>2 AND employeeid <>5 AND employeeid <> NULL)

    And as mentioned above, TRUE AND TRUE AND UNKNOWN equals UNKNOWN.

    To make it work I need to remove the NULLS:

    select * from employees where employeeid NOT IN (select reportsto from employees where reportsto is not null)

    Now I just need to check my actual code to make sure I haven't fallen for this one in real life.

  • Tao Klerks

    SSCarpal Tunnel

    Points: 4297

    Hmm, nice one, I've never seen that issue described.

    That provides another reason why the following syntax is (in my opinion) preferable:

    select *

    from employees

    left join employees as subordinates on employees.employeeid = subordinates.reportsto

    where subordinates.employeeid Is Null

    The main other reason I know of is performance on complex queries, where the query optimizer may sometimes evaluate the subquery (potentially passing a lot of data to the outer query) instead of doing an efficient join (I don't know the exact circumstances governing this, but I have definitely seen it happen)

    Unfortunately the above syntax/approach has the disadvantage that it is less intuitive to many people 🙁

    I make it a rule that subqueries (but not derived tables) are to be avoided at all costs; does anyone have a more flexible rule of thumb that can allow the use of subqueries without risk of perfromance or logic issues?

    http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.

  • Marius Els

    Right there with Babe

    Points: 760

    This question got me!

    Well done muzahm

    The set ansi-nulls off could off course be debated, but still, good question.

    The first if I had no problem with, but the second one got me...

    I learned a lot about nulls through Tao's explanation and examples. Thanks, clear, precise and to the point.

    The matter on nulls can so easily lead to queries not returning expected results and if you are unaware of the implications don't even realize your query is not returning what you expect and the "bug" is left in the code.....

  • Sean Lange

    SSC Guru

    Points: 286536

    Tao Klerks (11/18/2008)


    Hmm, nice one, I've never seen that issue described.

    That provides another reason why the following syntax is (in my opinion) preferable:

    select *

    from employees

    left join employees as subordinates on employees.employeeid = subordinates.reportsto

    where subordinates.employeeid Is Null

    The main other reason I know of is performance on complex queries, where the query optimizer may sometimes evaluate the subquery (potentially passing a lot of data to the outer query) instead of doing an efficient join (I don't know the exact circumstances governing this, but I have definitely seen it happen)

    Unfortunately the above syntax/approach has the disadvantage that it is less intuitive to many people 🙁

    I make it a rule that subqueries (but not derived tables) are to be avoided at all costs; does anyone have a more flexible rule of thumb that can allow the use of subqueries without risk of perfromance or logic issues?

    This example is really more of a find unmatched query. It would return an employee record if the employeeid was deleted but another employee record still had that ID as the reportsto in addition to those that are truly unassigned. Let's say we had EmployeeID 10 and an employee with ReportsTo as 10 (let's say ID 45), then we delete Employees where employeeid = 10. In the above query we would see employeeid 45 because the join would fail and leave the subordinates derived table as null. 🙂

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Tao Klerks

    SSCarpal Tunnel

    Points: 4297

    Hi slange,

    I think you're confusing a couple of possibilities.

    The query above reports employees who have no "subordinates" - there are no employees that have their employeeid in the reportsto field.

    The query that you are thinking of is almost exactly the opposite:

    select *

    from employees

    left join employees as supervisors on employees.reportsto = supervisors.employeeid

    where supervisors.employeeid Is Null

    and employees.reportsto Is Not Null

    It's actually slightly different because it also has an extra WHERE criterion - making sure that the employees returned actually have a reportsto value (for a supervisor that no longer exists, or never existed).

    Does this help?

    http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.

  • Sean Lange

    SSC Guru

    Points: 286536

    Tao Klerks (11/18/2008)


    Hi slange,

    I think you're confusing a couple of possibilities.

    The query above reports employees who have no "subordinates" - there are no employees that have their employeeid in the reportsto field.

    The query that you are thinking of is almost exactly the opposite:

    select *

    from employees

    left join employees as supervisors on employees.reportsto = supervisors.employeeid

    where supervisors.employeeid Is Null

    and employees.reportsto Is Not Null

    It's actually slightly different because it also has an extra WHERE criterion - making sure that the employees returned actually have a reportsto value (for a supervisor that no longer exists, or never existed).

    Does this help?

    Try this and you will see what I mean. 🙂

    create table #employees (employeeid int, reportsto int)

    insert #employees select 10, null

    insert #employees select 45, 10

    --here is the original query

    select *

    from #employees

    left join #employees as subordinates on #employees.employeeid = subordinates.reportsto

    where subordinates.employeeid Is Null

    --now delete the reportsto record

    delete #employees where employeeid = 10

    --they still show up

    select *

    from #employees

    left join #employees as subordinates on #employees.employeeid = subordinates.reportsto

    where subordinates.employeeid Is Null

    drop table #employees

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • archie flockhart

    SSCrazy

    Points: 2339

    Yes, employee 45 SHOULD show up in this list, because nobody reports to them.

    Add another record where someone reports to 45, then 45 will not appear - even if you delete its parent record:

    create table #employees (employeeid int, reportsto int)

    insert #employees select 10, null

    insert #employees select 45, 10

    insert #employees select 50, 45

    --here is the original query - we find employee 50 as the only one who does not have anyone reporting to them

    select *

    from #employees

    left join #employees as subordinates on #employees.employeeid = subordinates.reportsto

    where subordinates.employeeid Is Null

    --now delete the reportsto record

    delete #employees where employeeid = 10

    -- we still get the same answer, 45 still does not appear, even though its parent is deleted

    select *

    from #employees

    left join #employees as subordinates on #employees.employeeid = subordinates.reportsto

    where subordinates.employeeid Is Null

    drop table #employees

  • Yelena Varshal

    SSC-Dedicated

    Points: 34279

    As rightly mentioned by Tao, try to run it with ANSI_NULLS OFF:

    set ansi_nulls off

    IF(NULL IN (34, 35, NULL))PRINT 'TRUE'

    IF(34 NOT IN (35, 36, NULL))PRINT 'FALSE'

    and you will get a completely different answer

    Regards,Yelena Varsha

  • Anipaul

    SSC-Insane

    Points: 24681

    Great one. I got confused whether there is something wrong. It was a clever question. But with Set Ansi_Nulls off we are getting different result.

  • notquitexena

    SSCrazy

    Points: 2654

    Thanks to Tao Klerks and Archie Flockhart for a very edifying discussion of nulls. I've saved your comments into my sql server reference folder.

    -- Save the planet. It's where I keep all my stuff.

  • Carlton Leach

    SSCarpal Tunnel

    Points: 4202

    with ansi nulls on: NULL is like a black hole that converts anything it comes into contact with to NULL as well. Kind of like a vampire, NULL bites other values on the neck then what do you know, there vampries (NULL) to.

    This is of course if you do not use ISNULL/IS NULL which is like garlic for NULL.

    Carlton..

  • kapil_kk

    SSC-Insane

    Points: 21316

    nice question and excellent explanation from you guys

    _______________________________________________________________
    To get quick answer follow this link:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply