Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

IN and NOT IN

One of the most common mistakes made in T-SQL is thinking that these behave identically.  I've personally opened up a forum topic on it because I didn't know what the difference was.  This post will join a small army of other places on the net devoted to correcting this misunderstanding.

They aren't completely dissimilar; they behave exactly as you would expect them to... with the exception of NULL's.  Because nothing EQUALS NULL (Dependent upon settings, see below) the difference in the internal logic matters.  Gail Shaw initially explained this to me when I asked the question on the forums and I wanted to use her explanation here, but I can’t seem to find it; so here's my own version of an explanation:

When you use IN, you're really saying "WHERE myvalue = 'A' OR myvalue = 'B' OR myvalue = NULL"
Your NULLS won't cause the entire statement to fail because it's only an OR.

When you use NOT IN you're really saying 'WHERE myvalue <> 'A' AND myvalue <> 'B' AND myvalue <> NULL “
This is where the problem arises.  Since a NULL in SQL is an unknown value, you can't test = or <> on it and you get no results.  Without the NULL, you'd be fine.

Here's a simple example to demonstrate.

DECLARE @T TABLE(
Val varchar(5)) 

DECLARE @T2 TABLE(
Val varchar(5)) 

INSERT INTO @T(Val)
SELECT 'A' UNION ALL SELECT 'B' UNION ALL SELECT 'C' UNION ALL
SELECT 'D' UNION ALL SELECT 'E' UNION ALL SELECT 'F' 

INSERT INTO @T2(Val)
SELECT 'A' UNION ALL SELECT 'B' UNION ALL SELECT 'C' UNION ALL
SELECT NULL 

SET ANSI_NULLS ON
SELECT * FROM @T WHERE Val IN (SELECT * FROM @T2)
SELECT * FROM @T WHERE Val NOT IN (SELECT * FROM @T2)
SELECT * FROM @T WHERE Val NOT IN (SELECT * FROM @T2 WHERE Val IS NOT NULL) 

This issue is further complicated by the ANSI_NULLS setting.  While I believe most people have this turned ON, the fact that it is an option introduces another variable into the mix.  NOT IN will not fail in the same way if you have ANSI_NULLS set to OFF.  (Try the above example again after changing ON to OFF)

[Edit] Related Article by Gail Shaw (whom I mentioned earlier in this post): http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/ [/Edit]

Comments

Posted by VALEK on 4 February 2010

The IN and NOT IN has effect on the execution plan as well. I thought this article could be a little bit more comprehensive in the sense of explaining all differences between the two.

Posted by Bruc0Blachf0rd on 4 February 2010

Thank-you; this has confused me for a long time. I could never figure out why "Not IN" would fail sometimes.

Posted by abemby on 4 February 2010

You can avoid failing atleast by using IsNull()

-----------------------------------------------

SELECT * FROM @T

WHERE Not IsNull(Val,'') IN  (SELECT * FROM @T2)

Or

SELECT * FROM @T

WHERE Not IsNull(Val,0) IN  (SELECT * FROM @T2)

-----------------------------------------------

Depending on the type - varchar or number or bit

Posted by Kevin on 4 February 2010

Good thing to know.  Thanks for the explanation.

Posted by crussell on 4 February 2010

You said that you can't compare a value to null and that you get no results. In the case of the example the net result is that it fails the whole statement. One of the values of @T is D. So as described it is like saying

Display D if the following is True.

D <> A AND D <> B AND D <> C AND D <> NULL

But since D did not listed out in the results then the whole statement must have been false. We know the first three compares are true. Therefore the last compare, D <> NULL, must have equated to false.

So instead of saying it has no result, shouldn't we be saying it has a result and that result is false? You can call it whatever you want but the net result is false, just as if the NULL value would have been the letter D.

If D <> NULL is false, does that mean D = NULL is true?

I'm so confused...:)

Posted by bedski on 4 February 2010

I really appreciate this posting.  I guess I haven't run across this problem yet, but it's good information to know.

I also wanted to add that abemby had the right idea, but I think to avoid failing, you need to use isnull() a little differently as follows:

SELECT * FROM @T WHERE Val NOT IN (SELECT isnull(val,'') FROM @T2)

Posted by bedwards-935835 on 4 February 2010

I wanted to explain my understanding of what crussell is asking.  Anything compared to null is always false, so if you have multiple "and" statements and one of those statements is comparing to null, then the entire statement is false.

Posted by richardd on 4 February 2010

Not quite. Anything compared to Null is unknown, not false.

The "OR" operator returns true if either operand is true, false if both operands are false, and unknown in any other case:

Or   T  F  N

T    T  T  T

F    T  F  N

N    T  N  N

The "AND" operator returns unknown if either operand is unknown, true if both operands are true, and false in any other case:

And  T  F  N

T    T  F  N

F    F  F  N

N    N  N  N

The WHERE clause limits the returned records to those where the condition evaluates to true, filtering out any where the condition returns false or unknown. So the only case when IN works with Null values is testing whether a non-Null input belongs to a sequence with at least one non-Null value:

A In (A, Null)

=> (A = A) Or (A = Null)

=> (True) Or (Unknown)

=> True

Null In (A, Null)

=> (Null = A) Or (Null = Null)

=> (Unknown) Or (Unknown)

=> Unknown

B Not In (A, Null)

=> (B != A) And (B != Null)

=> (True) And (Unknown)

=> Unknown

Null Not In (A, Null)

=> (Null != A) And (Null != Null)

=> (Unknown) And (Unknown)

=> Unknown

Posted by Rob on 4 February 2010

VALEK is right; 'NOT IN' should produce a non-sargable predicate that will cause a table scan operation in the query plan. This can be significant if you have tons of data (not a big deal with small tables).

Posted by Seth Phelabaum on 4 February 2010

Valek:  Thanks for the feedback.  I'm still trying to feel out a good compromise in length vs. completeness.

abemby:  The issue I'm demonstrating isn't the NULL in the value you're comparing (There are no NULLS in @T, and they don't affect the statement if there are), it is the NULLS in the list you are searching.  Run those two statements.  Unless you have ANSI_NULLS OFF, they don't return anything.  

crussel:  As richardd explains in his response, comparing anything against a NULL is not true or false, it is unknown.  T-SQL uses something referred to as Three valued Logic (3VL).  There isn't just a true or false, there is also 'Unknown'.  Because you don't know what the value of a NULL is, you can't test it for equality.  To qualify for your WHERE statement though, it must equal 'TRUE' and so you get no results.  

To approach the concept of 3VL in a slightly different manner, consider this situation:  I walk up to you and tell you that I just bought a new car.  I then ask you "Is my new car Red?"  You can't say Yes or No, because I haven't told you yet.  My car *might* be red, but until I tell you, your answer has to be 'How the heck should I know, it's your car!' or 'I don't know.'  Probably not my best anology ever... but I find that sometimes more concrete scenarios help.

Posted by Seth Phelabaum on 4 February 2010

Rob:  I'll write a followup to delve into this.

Posted by chuan on 4 February 2010

This tip is very helpful.

Based on options above, I would like to use:

select * from @T

where  isnull(Val,'') not in (select isnull(val,'') from @T2)

to filter out NULL value on both sides.

Posted by yvan2000 on 4 February 2010

IMHO whole topic is completely useless. Instead of IN i prefer use of INNER JOIN. And instead of NOT IN i use anti join:

-- inner join

SELECT t.*

FROM @T as t

inner join @T2 as t2

on t2.Val = t.Val

-- anti join

SELECT t.*

FROM @T as t

left outer join @T2 as t2

on t2.Val = t.Val

where t2.Val is null

Posted by Seth Phelabaum on 5 February 2010

yvan2000:  I'm not championing the notion of using NOT IN over left joins or not exists.  I'm not even talking about the performance differences between any of those here.  I'm simply explaining the behavior of one method of doing things.  Your preference isn't necessarily everyone's preference and knowledge is rarely useless, even if it doesn't directly help you.

Posted by JJ B on 19 February 2010

You example code and richardd's explanation made this really clear.  Very interesting!  Thanks for taking the time to explain it.

Leave a Comment

Please register or log in to leave a comment.