Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Understanding NULL Expand / Collapse
Author
Message
Posted Tuesday, November 18, 2008 12:08 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 9, 2010 4:33 AM
Points: 153, Visits: 47
Comments posted to this topic are about the item Understanding NULL
Post #604126
Posted Tuesday, November 18, 2008 3:19 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, September 15, 2014 11:45 PM
Points: 1,385, Visits: 1,245
A slightly easier way to explain might be the fully-written-out equivalents of the IN clauses.

I've taken the liberty of "paraphrasing" T-SQL Syntax a little here, as there is no way of representing boolean values (the results of boolean operations) in T-SQL (or any SQL?) that I know of.

IF(NULL IN (34, 35, NULL))PRINT 'TRUE' =>
IF(NULL = 34 OR NULL = 35 OR NULL = NULL) PRINT 'TRUE' =>
IF(UNKNOWN OR UNKNOWN OR UNKNOWN) PRINT 'TRUE' =>
IF(UNKNOWN) PRINT 'TRUE'


and

IF(34 NOT IN (35, 36, NULL))PRINT 'FALSE' =>
IF(NOT (34 = 35 OR 34 = 36 OR 34 = NULL)) PRINT 'FALSE' =>
IF(NOT (FALSE OR FALSE OR UNKNOWN)) PRINT 'FALSE' =>
IF(NOT (UNKNOWN)) PRINT 'FALSE'



Because we all know(? :) ) that "False OR UNKNOWN" evaluates to UNKNOWN, whereas "True OR UNKNOWN" evaluates to True...

There's some great articles on SQLServerCentral.com about Nulls, eg: http://www.sqlservercentral.com/articles/Advanced+Querying/2829/


Now if you REALLY want to have some fun, turn off the ANSI_NULLS option:
SET ANSI_NULLS OFF

All of a sudden three-valued logic becomes two-valued, and you have the equivalent of:

IF(NULL IN (34, 35, NULL))PRINT 'TRUE' =>
IF(NULL = 34 OR NULL = 35 OR NULL = NULL) PRINT 'TRUE' =>
IF(FALSE OR FALSE OR TRUE) PRINT 'TRUE' =>
IF(TRUE) PRINT 'TRUE'


and

IF(34 NOT IN (35, 36, NULL))PRINT 'FALSE' =>
IF(NOT (34 = 35 OR 34 = 36 OR 34 = NULL)) PRINT 'FALSE' =>
IF(NOT (FALSE OR FALSE OR FALSE)) PRINT 'FALSE' =>
IF(NOT (FALSE)) PRINT 'FALSE'



And now everything prints as you might have expected if you came to SQL Server before 2000, which is when ANSI_NULLS became default...

(and incidentally, that's a small issue in the question, which assumed that ANSI_NULLS was off - a reasonable assumption in most environments, but still a rich source of horrible confusion when it is not)


http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
Post #604184
Posted Tuesday, November 18, 2008 4:33 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, July 9, 2014 2:52 AM
Points: 1,153, Visits: 1,047
Good question, and good comment to explain the result and the fact that it depends on ANSI_NULLS setting.

It got me thinking about the other IN / NOT IN syntax as well:

Suppose I want a list of all the employees in NORTHWIND database who have other people reporting to them. I coud do it like this:

select * from employees where employeeid  IN (select reportsto from employees) 

and I get 2 records: employee 2 and employee 5.


Suppose I want a list of all the employees in NORTHWIND database who DON'T have anyone reporting to them. If I try it using the apparently equivalent code ...

select * from employees where employeeid NOT IN (select reportsto from employees) 

... I get no records returned.

The reason is that there is one employee with a NULL entry in "reportsto", which makes the NOT IN comparison translate to (employeeid <>2 AND employeeid <>5 AND employeeid <> NULL)

And as mentioned above, TRUE AND TRUE AND UNKNOWN equals UNKNOWN.

To make it work I need to remove the NULLS:

select * from employees where employeeid NOT IN (select reportsto from employees where reportsto is not null) 



Now I just need to check my actual code to make sure I haven't fallen for this one in real life.










Post #604220
Posted Tuesday, November 18, 2008 4:53 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, September 15, 2014 11:45 PM
Points: 1,385, Visits: 1,245
Hmm, nice one, I've never seen that issue described.

That provides another reason why the following syntax is (in my opinion) preferable:

select * 
from employees
left join employees as subordinates on employees.employeeid = subordinates.reportsto
where subordinates.employeeid Is Null

The main other reason I know of is performance on complex queries, where the query optimizer may sometimes evaluate the subquery (potentially passing a lot of data to the outer query) instead of doing an efficient join (I don't know the exact circumstances governing this, but I have definitely seen it happen)

Unfortunately the above syntax/approach has the disadvantage that it is less intuitive to many people

I make it a rule that subqueries (but not derived tables) are to be avoided at all costs; does anyone have a more flexible rule of thumb that can allow the use of subqueries without risk of perfromance or logic issues?


http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
Post #604231
Posted Tuesday, November 18, 2008 5:58 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Thursday, July 10, 2014 12:43 AM
Points: 217, Visits: 170
This question got me!

Well done muzahm

The set ansi-nulls off could off course be debated, but still, good question.

The first if I had no problem with, but the second one got me...

I learned a lot about nulls through Tao's explanation and examples. Thanks, clear, precise and to the point.

The matter on nulls can so easily lead to queries not returning expected results and if you are unaware of the implications don't even realize your query is not returning what you expect and the "bug" is left in the code.....
Post #604299
Posted Tuesday, November 18, 2008 8:41 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 2:34 PM
Points: 13,481, Visits: 12,342
Tao Klerks (11/18/2008)
Hmm, nice one, I've never seen that issue described.

That provides another reason why the following syntax is (in my opinion) preferable:

select * 
from employees
left join employees as subordinates on employees.employeeid = subordinates.reportsto
where subordinates.employeeid Is Null

The main other reason I know of is performance on complex queries, where the query optimizer may sometimes evaluate the subquery (potentially passing a lot of data to the outer query) instead of doing an efficient join (I don't know the exact circumstances governing this, but I have definitely seen it happen)

Unfortunately the above syntax/approach has the disadvantage that it is less intuitive to many people

I make it a rule that subqueries (but not derived tables) are to be avoided at all costs; does anyone have a more flexible rule of thumb that can allow the use of subqueries without risk of perfromance or logic issues?


This example is really more of a find unmatched query. It would return an employee record if the employeeid was deleted but another employee record still had that ID as the reportsto in addition to those that are truly unassigned. Let's say we had EmployeeID 10 and an employee with ReportsTo as 10 (let's say ID 45), then we delete Employees where employeeid = 10. In the above query we would see employeeid 45 because the join would fail and leave the subordinates derived table as null. :)


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #604458
Posted Tuesday, November 18, 2008 9:02 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, September 15, 2014 11:45 PM
Points: 1,385, Visits: 1,245
Hi slange,

I think you're confusing a couple of possibilities.

The query above reports employees who have no "subordinates" - there are no employees that have their employeeid in the reportsto field.

The query that you are thinking of is almost exactly the opposite:

select * 
from employees
left join employees as supervisors on employees.reportsto = supervisors.employeeid
where supervisors.employeeid Is Null
and employees.reportsto Is Not Null

It's actually slightly different because it also has an extra WHERE criterion - making sure that the employees returned actually have a reportsto value (for a supervisor that no longer exists, or never existed).

Does this help?


http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
Post #604486
Posted Tuesday, November 18, 2008 9:13 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 2:34 PM
Points: 13,481, Visits: 12,342
Tao Klerks (11/18/2008)
Hi slange,

I think you're confusing a couple of possibilities.

The query above reports employees who have no "subordinates" - there are no employees that have their employeeid in the reportsto field.

The query that you are thinking of is almost exactly the opposite:

select * 
from employees
left join employees as supervisors on employees.reportsto = supervisors.employeeid
where supervisors.employeeid Is Null
and employees.reportsto Is Not Null

It's actually slightly different because it also has an extra WHERE criterion - making sure that the employees returned actually have a reportsto value (for a supervisor that no longer exists, or never existed).

Does this help?


Try this and you will see what I mean. :)

create table #employees (employeeid int, reportsto int)

insert #employees select 10, null
insert #employees select 45, 10

--here is the original query
select *
from #employees
left join #employees as subordinates on #employees.employeeid = subordinates.reportsto
where subordinates.employeeid Is Null

--now delete the reportsto record
delete #employees where employeeid = 10

--they still show up
select *
from #employees
left join #employees as subordinates on #employees.employeeid = subordinates.reportsto
where subordinates.employeeid Is Null

drop table #employees


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #604494
Posted Tuesday, November 18, 2008 10:08 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, July 9, 2014 2:52 AM
Points: 1,153, Visits: 1,047
Yes, employee 45 SHOULD show up in this list, because nobody reports to them.

Add another record where someone reports to 45, then 45 will not appear - even if you delete its parent record:



create table #employees (employeeid int, reportsto int)

insert #employees select 10, null
insert #employees select 45, 10
insert #employees select 50, 45

--here is the original query - we find employee 50 as the only one who does not have anyone reporting to them
select *
from #employees
left join #employees as subordinates on #employees.employeeid = subordinates.reportsto
where subordinates.employeeid Is Null

--now delete the reportsto record
delete #employees where employeeid = 10

-- we still get the same answer, 45 still does not appear, even though its parent is deleted
select *
from #employees
left join #employees as subordinates on #employees.employeeid = subordinates.reportsto
where subordinates.employeeid Is Null

drop table #employees







Post #604546
Posted Tuesday, November 18, 2008 11:55 AM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Monday, September 15, 2014 4:12 PM
Points: 3,475, Visits: 582
As rightly mentioned by Tao, try to run it with ANSI_NULLS OFF:

set ansi_nulls off
IF(NULL IN (34, 35, NULL))PRINT 'TRUE'
IF(34 NOT IN (35, 36, NULL))PRINT 'FALSE'

and you will get a completely different answer



Regards,
Yelena Varshal

Post #604635
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse