Why are these results different (SELECT In versus JOIN <>)

  • Hi,

    I do not understand why I get different results when running the two SQL statements below:

    SELECT DISTINCT a.ID FROM T1 a

    INNER JOIN T2 b on b.ID <> a.ID

    SELECT DISTINCT a.ID FROM T1 a

    WHERE a.ID NOT IN (SELECT DISTINCT ID FROM T2)

    (obviously, the ID values are same (when same businessentity) in both tables and I should have a bunch of rows in one table but not in the other)

    Hoping someone can show me the light 🙂

    Kind regards

    B

  • Because they are two completely different queries.

    The <> join matches anything that isn't equal, so let's say we have these tables

    Tbl1

    ID Col1

    1 'a'

    2 'b'

    5 'e'

    Tbl2

    ID Col1

    1 'a'

    3 'c'

    4 'd'

    The first query says that columns that are not equal are to be considered a match, so the row from table 1 that has an ID of 1 matches rows 3 and 4 in table 2. The row with ID 2 matches rows 1, 3 and 4, the row with ID 5 matches 1,3 and 4, so your results would be

    1, 'a', 3, 'c'

    1, 'a', 4, 'd'

    2, 'b', 1, 'a'

    2, 'b', 3, 'c'

    2, 'b', 4, 'd'

    5, 'e', 1, 'a'

    5, 'e', 3, 'c'

    5, 'e', 4, 'd'

    It's a slightly restricted cartesian product.

    The not in says to find all the rows in table 1 that to not have an exact match in table 1.

    Now, the ID 1 is in both, so that doesn't match. There's no ID 2 or 5 in table 2, so both of those are returned

    2, 'b'

    5, 'e'

    The almost equivalent query to the not in would be a left outer join

    SELECT t1.ID, 1t.col1 FROM Tbl1 t1 left outer join Tbl2 t2 on t1.ID = t2.ID where t2.ID is NULL.

    p.s. Drop the distincts, they're not necessary in the IN and they shouldn't be necessary in the outer query unless you have a bad design or bad data

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Obviously it will be different....

    Suppose

    T1 has data ( 1, 2,3,4,5)

    T2 has data( 2,3,5)

    In the join condition which you wrote, it will check like

    1 <> 2 -> true display 1

    2<> 2 -> flase no result

    3 <> 2 -> true display 3

    4 <> 2 -> true display 4

    5 <> 2 -> true display 5

    and so on

    If u remove distinct first 4 values will be 1,3,4,5...

    To achive the same result as in the query

    "SELECT DISTINCT a.A FROM #T1 a

    WHERE a.A NOT IN (SELECT DISTINCT A FROM #T2)"

    Use the below query using Joins

    SELECT a.A FROM #T1 a

    LEFT OUTER JOIN #T2 b on b.A = a.A where b.a IS NULL

    ----------------------

    Create table #T1( A int)

    Create table #T2( A int)

    insert into #T1 values(1)

    insert into #T1 values(2)

    insert into #T1 values(3)

    insert into #T1 values(4)

    insert into #T1 values(5)

    insert into #T2 values(2)

    insert into #T2 values(3)

    insert into #T2 values(5)

    select * from #T1

    select * from #T2

    -----------------------------

  • ....

    Thanks a lot guys ... I needed this ....

    and yes, different queries give different results ... no further comment.

    I'm off to get a cold shower, a can of beer, and stand by the BBQ to do something completely different with my life ... 🙂

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply