"NOT IN" Operator and Three Valued Logic

Question

"NOT IN" Operator and Three Valued Logic

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply

PHYData DBA SSCertifiable Points: 7541 More actions · Answer 1

SQLRNNR (11/1/2013)
I think something must have been somewhat screwed up with this question since the time some people answered it.
I see no actual question being asked. The code is all outside of the code boxes. And worst is that when I selected 0, I was told that I was incorrect and that the correct answer was 0. I think the question must have been edited and then correct answer option was changed while the explanation was unchanged.

Weird stuff is going on with the site today. All of the sudden starting today this is the only Forum I can post to. :crazy:

Koen Verbeeck SSC Guru Points: 259165 More actions · Answer 2

Nice question, thanks.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Dheeraj Gadave SSC Veteran Points: 283 More actions · Answer 3

Dheeraj Gadave

SSC Veteran

Points: 283

November 11, 2013 at 2:40 am

#1665397

Nice Question...I was not aware about this.

jfgoude SSCrazy Points: 2586 More actions · Answer 4

jfgoude

SSCrazy

Points: 2586

November 22, 2013 at 2:58 am

#1668330

very nice and subtle one

thanks

+2

prakashr.r7 SSCrazy Points: 2828 More actions · Answer 5

I have been using NOT IN operator in many places but i had not known this problem with NULL value. Really i feel myself embarrased.

Wonder full question. i should thank you for posting this question so that the people like me could get to know by now.

Jamie-2229 SSCrazy Eights Points: 8156 More actions · Answer 6

I'm coming in late on this question. It's been my experience the "IN" operator in SQL is EXTREMELY inefficient. The why it is inefficient - the "why" is key here. SQL Server uses a query execution plan based on data it can expect to see in a query. An IN statement is ambiguous. Anything can be "IN". As the example shows, this includes NULL which will compare with everything thus you get the dumb answer of nothing even though we know there are several items not in the Table2 name.

To iterate on why (IMHO) SQL should eliminate the use of the IN statement, take an example from DEVELOPER who runs a statement to lookup several thousand (or more) users and proceeds to put their ID numbers (an index of names for example) into a comma delimited list and then use the IN keyword to create the query (we only have few in this example with is acceptable).

SQL hasn't a clue what to expect so the query plan goes out the window. Meanwhile, had the DEVELOPER just joined his first query to the source, a simple WHERE statement or a condition for the "ON x.a=y.a" would have given him a super fast query.

The last answer was more correct. The user should have seen the NULL and fixed the query. Zero is not a character, it is a number. Maybe the answer meant rows? No brainer, there should have been rows here and there were none - so I think the answer should have been: CHECK IT AGAIN AND FIX IT!

Jamie

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

Jamie-2229 (12/25/2014)
I'm coming in late on this question. It's been my experience the "IN" operator in SQL is EXTREMELY inefficient. The why it is inefficient - the "why" is key here. SQL Server uses a query execution plan based on data it can expect to see in a query. An IN statement is ambiguous. Anything can be "IN". As our example shows, this includes NULL which will compare with everything thus you get the dumb answer of nothing even though we know there are several items not in the Table2 name.
To iterate on why (IMHO) SQL should eliminate the use of the IN statement, take an example from DEVELOPER SA (short from smart something or other) who runs a statement to lookup several thousand (or more) users and proceeds to put their ID numbers (an index of names for example) into a comma delimited list and then use the IN keyword to create the query.
SQL hasn't a clue what to expect so the query plan goes out the window. Meanwhile, had DEVELOPER SA just joined his first query to the source, a simple WHERE statement or a condition for the ON would have given him a super fast query.
Meanwhile, the query DEVELOPER SA has provides leaves poor SQL Server chugging and chugging and chugging while another 10 thousand users or so are wondering why the threading is so poor and imagining there is something wrong with their network.
OK, so a bit dramatic, yes. But I find the IN statement evil! Ok for a couple of items (if you remember to remove the NULL of course), but disaster for a large list.
OH YEAH - one more thing. The last answer was more correct. The user should have seen the NULL and fixed the query. Zero is not a character, it is a number. Maybe the answer meant rows? No brainer, there should have been rows here and there wasn't - FIX IT!

I think you may have missed the point of the question. When using NOT IN and the list contains a null value you get no results, which is what the question was trying to highlight.

Jamie-2229 SSCrazy Eights Points: 8156 More actions · Answer 8

Maybe. I thought the point was that there was a third variable, in this case a null which compares as equal to all the values of the name so nothing shows up. Which I think I stated in the last answer.

My point here is, sure, a SQL developer should catch that and fix it, - I get a bit hot under the collar when developers write with an IN or NOT IN statement and drop it permanently into code. IMHO that "IN" statement itself would be the wrong answer. I was looking for there to be no rows, got it wrong (of course, because I thought it was a user error - as in the developer doesn't understand how a null works) and then thought about the example.

Don't remember what the last answer was but it was a better answer than zero. I tend to rant about using IN and NOT IN just as a matter of (wow, how would a developer know they are tossing that wrench into the gears?) Not likely they would know it, so off I rant. I edited and toned it down a bit.

Jamie

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 9

Jamie-2229 (12/25/2014)
Maybe. I thought the point was that there was a third variable, in this case a null which compares as equal to all the values of the name so nothing shows up. Which I think I stated in the last answer.
My point here is, sure, a SQL developer should catch that and fix it, - I get a bit hot under the collar when developers write with an IN or NOT IN statement and drop it permanently into code. IMHO that "IN" statement itself would be the wrong answer. I was looking for there to be no rows, got it wrong (of course, because I thought it was a user error - as in the developer doesn't understand how a null works) and then thought about the example.
Don't remember what the last answer was but it was a better answer than zero. I tend to rant about using IN and NOT IN just as a matter of (wow, how would a developer know they are tossing that wrench into the gears?) Not likely they would know it, so off I rant. I edited and toned it down a bit.

An IN operator is equivalent to a multiple OR clauses. They do not have the same problem as NOT IN if there is a null in the list. The NOT IN changes that to multiple AND clauses which is why the NULL in the list causes a problem. Nothing will evaluate to TRUE with a NULL value which is why the NOT IN fails. Your statement that NULL compares as equal to all values is incorrect.

Jamie-2229 SSCrazy Eights Points: 8156 More actions · Answer 10

Yep, you're right. So if you are using an "IN" or a "NOT IN" statement, not only is it faster to join than use "IN" or "NOT IN", make the null conditions in the ON statement safe for a JOIN.

For example, suppose you have a half million member branch of the military and your subquery is (SELECT ID FROM WHERE sex='M') . Assume this is a non-clustered index on the sex column or perhaps uses an INCLUDE on the ID column.

Now you take and write your query:

Select * From TheLASTTenYearsOfServiceRecords

WHERE [USER_ID] NOT IN (SELECT ID FROM WHERE sex = 'M') Assume a half-million records, each user has at least 1000 records, and that for some reason, about 10,000 users have no sex defined aka, sex is NULL. Also, for some reason, there was a software glitch three years back and TheLASTTenYearsOfServiceRecords data is corrupt because it was copied over and a null was introduced into the index which now has a NOCHECK on the [USER_ID] foreignkey index.

Run the query. Roll through the 50 million or so records and watch it fall apart.

Instead, a more correct way to write this is might be:

SELECT * FROM TheLASTTenYearsOfServiceRecords A

JOIN B

ON ISNULL(A.[USER_ID],0)=B.ID

WHERE ISNULL(sex,'')='M'

So, my point, stay away from IN statements - there's likely to be traps due to NULLs and the possibility of corrupt data. I think we are saying the same thing Lynn. I just get wound up when people use an IN statement. They are ugly.

Jamie

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 11

Jamie-2229 (12/26/2014)
Yep, you're right. So if you are using an "IN" or a "NOT IN" statement, not only is it faster to join than use "IN" or "NOT IN", make the null conditions in the ON statement safe for a JOIN.

You should not compare [NOT] IN to joins. They have different functions (extra columns added to result set; extra rows added if there are duplicates in the inputs).

The correct comparison is between [NOT] IN and [NOT] EXISTS. Every [NOT] IN condition with a subquery can be rewritten as [NOT] EXISTS (the reverse is not true), without the perceived incorrect behaviour caused by NULL values in the subquery.

I only ever use [NOT] IN with a list of constants (obviously never containing a hardcoded NULL). For subqueries, I always use [NOT] EXISTS.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

PHYData DBA SSCertifiable Points: 7541 More actions · Answer 12

Jamie-2229 (12/26/2014)
Yep, you're right. <deleted incomprehensible data />... They are ugly.

Jamie - What are you even talking about? The IN and NOT IN operator have been around for 20+ years.

They act no different today than when they were brand new.

Comparison operators do not compare to NULL values.

I did understand some of what you talked about with the execution plans. Only from the perspective that this is a case where actual execution plans used by SQL server are going to be different in a Production environment when compared to the development environment you are beginning to be familiar with.

You should never assume that an estimated execution plan on your Development laptop will be the same as the actual execution plan used by SQL server on a system satisfying 2000 TPS with 6 cpu cores and 96gig of dedicated memory.

Just MHO....

PHYData DBA SSCertifiable Points: 7541 More actions · Answer 13

Hugo Kornelis (12/26/2014)
Jamie-2229 (12/26/2014)
Yep, you're right. So if you are using an "IN" or a "NOT IN" statement, not only is it faster to join than use "IN" or "NOT IN", make the null conditions in the ON statement safe for a JOIN.
You should not compare [NOT] IN to joins. They have different functions (extra columns added to result set; extra rows added if there are duplicates in the inputs).
The correct comparison is between [NOT] IN and [NOT] EXISTS. Every [NOT] IN condition with a subquery can be rewritten as [NOT] EXISTS (the reverse is not true), without the perceived incorrect behaviour caused by NULL values in the subquery.
I only ever use [NOT] IN with a list of constants (obviously never containing a hardcoded NULL). For subqueries, I always use [NOT] EXISTS.

This is one time Hugo were I totally agree with your comments and that someone needed to put the right stuff up there. Thank you.

Jamie-2229 SSCrazy Eights Points: 8156 More actions · Answer 14

I'll be the first to admit I took a sharp right turn here.

Still, very valid point ... "NOT IN" and "IN" opens itself up to SQL abuse. Logically, it is simple to say "I want everything here that isn't there" and "NOT IN" fits your logic into a neat, clean grammatical package. Hugo posts a correct usage with static reference tables where you know there won't be NULLs, but the misuse (IMHO) outweighs its effectiveness.

My experience (watching the profiler) - most folks are unaware of the disease behind this logic. This example, with the NULL and the wrong answer being the right one, it SHRIEKS of poor design. ANSI isn't perfect. Given the learning opportunity where a JOIN can substitute for an IN statement and provide a terrific performance boost; usually developers get that.

I'm not versatile with Oracle - perhaps this sub-query usage is more effective there. Some programmers have that experience to make their judgements on (my assumption for this IMHO ridiculous use). In my experience, MICROSOFT SQL SERVER application for NOT IN and IN is rarely applied the way it is intended.

My two cents. Good to see you folks in such good form this morning. Hope your yesterday was quiet and happy. Enjoy the holidays. We up, we down, we ten-ten on the side.

Jamie

PHYData DBA SSCertifiable Points: 7541 More actions · Answer 15

Jamie-2229 (12/26/2014)
In my experience, MICROSOFT SQL SERVER application for NOT IN and IN is rarely applied the way it is intended.

This is what is so odd about what you are stating. Please read this prior to any response you may make.

http://en.wikipedia.org/wiki/SQL

Thank you. Notice that all the behaviors you are blaming "MICROSOFT SQL SERVER" for are standard and common to SQL as a Language.

There are places where the M$ TSQL does not follow the ANSI SQL standard.

MHO: The IN operator and using the NOT qualifier with it are "NOT IN" that set. Not even if you add to how they handle NULLS and 3VL.

"NOT IN" Operator and Three Valued Logic

Cookies on SQLServerCentral