Select Statement Using Not In and Null Values

  • Table1 def is

    IDint(Identity)

    Field1char

    Table2 def is

    ID int (Identity)

    Field1 char

    ID2 int (not identity)

    Table1 has ID values 1, 2, 3, 7, 8, 9

    Table2 has ID2 values 1, 2, 3, null

    If I use select * from table1 where id not in (select id2 from table2) I get no results; I would expect to return 7,8,9.

    If I remove the null entry, that is what I get; also, if I add 'where id2 is not null' to the subquery, I also get my expected result.

    Why does it behave this way?

  • The value NULL means the data value for the column is unknown. In your query, SQL compares each value in the Table1 with the subset returned from the Table2. Since Table2 includes a record with the value NULL it is undefined if the numbers (7,8,9) exists or not in the subset.

    By adding 'where id2 is not null' you are eliminating the record NULL from the subset, so now it can be determined that the numbers (7,8,9) are not in the subset.

  • I'm not sure I quite understand. Why wouldn't it ignore the null value or at least not stop there but continue the next records in the table?

  • The NULL value cannot be ignored because it's a record with a value NULL.

    SQL checks all (4) records in Table2.

  • Still not getting it. I don't understand how one entry with a "value" of null is preventing SQL from reporting the other entries that are not in the table.

  • quote:


    From BOL:

    A value of NULL indicates the value is unknown. A value of NULL is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a NULL and any other value, return unknown because the value of each NULL is unknown.


    When SQL compares the value 7 with NULL it returns NULL, then the expression NOT IN is undefined (returns NULL). It has to return TRUE for the record to be included in the result set.

  • Yeah - null handling in these types of queries is wierd. For example...

    select * from table1 where id not in ( null )

    will produce no results. The problem is in the internal handling of IN. Since all comparisons with null fail, it is a little like saying <> null (instead of IS NOT null) - which produces unexpected results unless you change the default settings.

    I would recommend you either leave IS NOT NULL in the subquery, or use a left join... i.e.

    SELECT t1.* FROM Table1 t1

    LEFT JOIN Table2 T2 ON T2.ID2 = T1.ID

    WHERE T2.ID2 IS NULL

    my $0.02

    Guarddata-

  • I got around it ok. I just don't recall ever encountering this circumstance before. I usually, for whatever reason, find it quicker to compose 'not in' sytax queries on the fly rather than using outer joins. From now on I'll have to be a little more careful. Thanks for the info.

  • Sure - I understand that. From a performance perspective, I have found the OUTER JOIN to be much better when there are lots of rows. The NOT IN syntax causes table scans...for what it's worth.

    Guarddata-

  • I normally use isnull(id2,-1)=-1 ( when it is a numeric) or isnull(id4,'---')='---' (where id4 is a string and '---' is an impossible value) instead of where id2 is null. Is the method i use not advisable when it comes to performance ..

    Harsha

    HArsha


    HArsha

  • Try the following

    select * from table1 where id not in (select id2 from table2 where isnull(id2,'') <> '' )

  • Might I suggest that you use a NOT EXISTS? It should perform better than a NOT IN. It's also curious that I couldn't reproduce your issue. It may be a database setting difference. Here's my test and example of NOT EXISTS:

    CREATE TABLE Table1 (

    ID INT,

    Field1 char(8))

    CREATE TABLE Table2 (

    ID INT,

    Field1 char(8),

    ID2 INT NULL)

    INSERT Table1 VALUES (1, 'a')

    INSERT Table1 VALUES (2, 'b')

    INSERT Table1 VALUES (3, 'c')

    INSERT Table1 VALUES (7, 'g')

    INSERT Table1 VALUES (8, 'h')

    INSERT Table1 VALUES (9, 'i')

    INSERT Table2 VALUES (1, 'a', 1)

    INSERT Table2 VALUES (2, 'b', 2)

    INSERT Table2 VALUES (3, 'c', 3)

    INSERT Table2 VALUES (4, 'd', NULL)

    SELECT * FROM Table1 T1 WHERE NOT EXISTS (SELECT 1 FROM Table2 T2 WHERE T2.ID2 = T1.ID)

    IDField1

    7g

    8h

    9i

    SELECT * FROM Table1 T1 WHERE ID NOT IN (SELECT ID2 FROM Table2 T2)

    IDField1

    7g

    8h

    9i

    Jeff

  • Yes, "not exists" with a correlated subquery is way faster than "not in".

    I've tested all these methods, and using a left join and looking for a null is fastest (beats out "not exists" by a little bit). But if I'm dealing with a small dataset I'll use "not exists", as it's easier to read, and the performance diff is neglible when under 1000 rows or so. Anything more and you better use the Left Join method.

    cl

    cl

    Signature is NULL

  • It's probably so obvious that you forgot to mention it, but you have to be careful to only use the left join method when at least one side of the comparison is a unique field, so you don't get duplicate rows in the result.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply