In, Not In. Why does one work but not the other?

  • Hi Guys,

    Here are 2 queries - to me, logic dictates that both should return the same result set. the first query returns 814 rows, the second; 0.

    select * from iseries_file where edinumber in (select con_edipolnum from [contract] where con_status not in (1,2,4))

    select * from iseries_file where edinumber not in (select con_edipolnum from [contract] where con_status in (1,2,4))

    the table 'iseries_file' contains household insurance data. it is a list of policy numbers and the valuable belongings specified in the home. edinumber is the policy number column.

    the table [contract] contains household insurance data too - more specifically the policy number and the policy's current status in the con_status field. stati 1,2 and 4 are the 3 stati that a policy can be in and still be called live. con_edipolnum is the policy number field. The number of records where con_status is in 1,2 or 4 is about 400,000 and the number where it is not in 1,2 or 4 is about 600,000.

    the columns con_edipolnum and edinumber are both char(20) fields and the problem appears to be around the 'edinumber in/not in' part of the query as 'not in' always returns 0 rows regardless of what you put in the parentheses.

    I'm loosing my marbles on this so I'm going to the pub now and I'm going to look for an explanation on monday. If anyone can offer any advice that would be nice 🙂

    Thanks all!

    Ben

    ^ Thats me!

    ----------------------------------------
    01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
    ----------------------------------------

  • ben.ward 42132 (10/16/2009)


    I'm loosing my marbles on this so I'm going to the pub now and I'm going to look for an explanation on monday. If anyone can offer any advice that would be nice 🙂

    One word. Nulls.

    The two queries aer equivalent providing that there are no nulls involved. As soon as there are, NOT IN will return 0 rows (providing ansi nulls are set on) because no value will return true for the comparison (Column <> NULL)

    This is easy to demonstrate.

    DECLARE @Table1 TABLE (

    ID INT,

    Name VARCHAR(10)

    )

    DECLARE @Table2 TABLE (

    ID INT,

    State Int

    )

    INSERT INTO @Table1 VALUES (1,'One')

    INSERT INTO @Table1 VALUES (2,'Two')

    INSERT INTO @Table1 VALUES (3,'Three')

    INSERT INTO @Table1 VALUES (4,'Four')

    INSERT INTO @Table1 VALUES (5,'Five')

    INSERT INTO @Table1 VALUES (NULL,'NULL')

    INSERT INTO @Table2 VALUES (1,1)

    INSERT INTO @Table2 VALUES (2,0)

    INSERT INTO @Table2 VALUES (3,1)

    INSERT INTO @Table2 VALUES (4,1)

    INSERT INTO @Table2 VALUES (5,1)

    INSERT INTO @Table2 VALUES (Null,1)

    -- Returns 1 row

    SELECT * FROM @Table1 WHERE id IN (SELECT ID FROM @Table2 WHERE State=0)

    -- returns 4 rows, because that null doesn't match anything, even null and hence is eliminated

    SELECT * FROM @Table1 WHERE id IN (SELECT ID FROM @Table2 WHERE State=1)

    -- Returns 0 rows, because the NULL doesn't return TRUE for any comparison.

    SELECT * FROM @Table1 WHERE id NOT IN (SELECT ID FROM @Table2 WHERE State=1)

    -- Returns 1 row, same as the initial IN, because we got rid of the nulls in the subquery

    SELECT * FROM @Table1 WHERE id NOT IN (SELECT ID FROM @Table2 WHERE State=1 AND ID IS NOT NULL)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (10/16/2009)


    One word. Nulls.

    Ah, beat me to it...

    some test data to corroborate:

    --create test tables

    create table #iseries_file (edinumber char(20),variousotherstuff varchar(200))

    create table #contract (con_edipolnum char(20),con_status int)

    --populate test data

    insert into #iseries_file

    select '123456','vt 54 t54 byt54 by6 y65 by5 6byy65'

    union all

    select '1234567','gb n5tu890bv 5t89gbt4y9 g4tby8g 4tb y8gtb8y g'

    union all

    select '47893275','hr3iofgbr8 vbt80 b4tuhv 80v4tb9bt0v43'

    union all

    select '54354265','vnthuvnjignv 84tvu8t0btu80gh tui0 ht0'

    union all

    select '46546436','gnjr3o tu803 8ut0bgvu8thug v8t4ut0 ht80u'

    union all

    select '54jk5436','gnjto3 nvu8t03 hut089hjguihgu8t 08tu0ht80u'

    union all

    select 'thr75802yvn','gnj4trkobg3uito t0h g8tu0gh ut0hgut8 0gt8'

    union all

    select 'tum8420ytn9y','njtgnuit4gh u4thg 5u9hg8950hg80h g48hg840'

    union all

    select '43890ny2t89','gnjti4o3ng 40hgu8t 0hgu8t0 ghut80ghtu03'

    union all

    select '5784937534543','tmjk3og j9t0jgitu95j i9t-4jg98t- 3jg89t5 -gj'

    union all

    select '543-543-5','gmjit93gjti9- tjg 9it-3jgi90j35t89gj 05'

    union all

    select '543255435432542','tn4uig0 tguibht30 uhtuig hut5ighu80t3 g8t'

    union all

    select '5435245435','g5mji43 g5j893 u589 890g h89650 gh6890'

    --more test data

    insert into #contract

    select null,1

    union all

    select '123456',1

    union all

    select '1234567',1

    union all

    select '47893275',1

    union all

    select '54354265',2

    union all

    select '46546436',2

    union all

    select '54jk5436',3

    union all

    select 'thr75802yvn',3

    union all

    select 'tum8420ytn9y',4

    union all

    select '43890ny2t89',4

    union all

    select '5784937534543',5

    union all

    select '543-543-5',5

    union all

    select '543255435432542',5

    union all

    select '5435245435',6

    --

    --test queries

    select * from #iseries_file where edinumber in (select con_edipolnum from #contract where con_status not in (1,2,4))

    select * from #iseries_file where edinumber not in (select con_edipolnum from #contract where con_status in (1,2,4))

    replace the queries with the below to get identical results:

    --test queries

    select * from #iseries_file where edinumber in (select isnull(con_edipolnum,'') from #contract where con_status not in (1,2,4))

    select * from #iseries_file where edinumber not in (select isnull(con_edipolnum,'') from #contract where con_status in (1,2,4))

    --------
    [font="Tahoma"]I love deadlines. I like the whooshing sound they make as they fly by. -Douglas Adams[/font]

  • I'm actually surprised this question isn't asked more.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Thanks very much!

    I'm amazed that a single null in a recordset will make the whole query fail but nevermind eh 🙂

    Happy programming.

    Ben

    Ben

    ^ Thats me!

    ----------------------------------------
    01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
    ----------------------------------------

  • ben.ward 42132 (10/19/2009)


    I'm amazed that a single null in a recordset will make the whole query fail but nevermind eh 🙂

    The query isn't failing. You're not getting error messages.

    The results are completely expected with NULL. No row will ever satisfy the condition SomeColumn <> NULL. A NOT IN is treated (kinda) like this

    NOT ( SomeColumn = <Value from first row> OR SomeColumn = Value from Second row> OR ... OR SomeColumn = <Value from last row>)

    If there's a null in any of those comparisons, the entire expression within the brackets becomes null. NOT (NULL) is still NULL and so no row returns TRUE from that comparison.

    If the nulls are bothering you, try NOT EXISTS rather than NOT NI. NOT EXISTS behaves differently w.r.t. nulls.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ah that makes sense then 🙂

    Thanks!

    Ben

    ^ Thats me!

    ----------------------------------------
    01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
    ----------------------------------------

  • I found this article was very helpful in explaining some common SQL query errors, and the IN(NULL) problem is #1 on his list!

    http://www.simple-talk.com/sql/t-sql-programming/ten-common-sql-programming-mistakes/

    Good luck,

    Rich

  • BenWard (10/19/2009)


    Thanks very much!

    I'm amazed that a single null in a recordset will make the whole query fail but nevermind eh 🙂

    I'm personally not a fan of that IN/NOT IN syntax..

    I tend to use correlated sub queries..

    SELECT *

    FROM dbo.iseries_file f

    WHERE EXISTS ( SELECT 'X'

    FROM [dbo].[contract] c

    WHERE c.edinumber = f.edinumber

    AND con_status NOT IN (1,2,4) )

    SELECT *

    FROM dbo.iseries_file f

    WHERE NOT EXISTS ( SELECT 'X'

    FROM [dbo].[contract] c

    WHERE c.edinumber = f.edinumber

    AND con_status IN (1,2,4) )

    Provided con_status is not nullable..

    CEWII

  • I use either joins or subqueries. I've found both effective.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I agree, depends on what I need and why I need it..

    CEWII

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply