Outer Join Mystery

  • So, how does Oracle return the "right" results?

  • The use of the old-style syntax promoted the entire WHERE clause to the ON clause.  You get the same results as the "wrong" query if you used "LEFT JOIN ... ON S. = O. AND o.rats_filename IS NOT NULL".  The IS NOT NULL condition is being used as a a join condition instead of a filter condition and, since none of the fields are null, has no effect.

    The only mystery to me is how Oracle differentiates join conditions and filter conditions.  Maybe only logical comparisons involving both tables are considered join conditions?

  • It simply returns the anticipated results, which is only those rows that don't match. Precisely how... you need to ask someone that understands Oracle. I think Scott Coleman's answers have been more precise than anything I'm going to offer up.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks. That makes more sense than any other explanation I've been able to come up with.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Itzik Ben-Gan wrote an article that explains this very clearly in SQL Magazine in the October 2004 issue. The InstantDoc ID is #43681.

    The problem is that in Old Style Joins, you can't separate the join clause and the Where Clause so the result may include extraneous NULL values in your result set.

    Here are two simple queries you can run in PUBS to see this:

    -- Old Style Join

    -- 13 Valid Entries, 6 Invalid Entries

    select o.name, i.name,i.indid

    from sysobjects o, sysindexes i

    where o.id *= i.id and

    o.type = 'S' and

    i.indid = 1

    -- New Style Join

    -- 13 Valid Entries, 0 Invalid Entries

    select o.name, i.name,i.indid

    from sysobjects o

    left outer join sysindexes i

    on o.id = i.id

    where o.type = 'S' and

    i.indid = 1

  • Interesting article. I'm glad I always use the newer syntax. To the poster who said the problem was that the problem was not knowing which part of the where to apply, I would have to disagree. It is something more than that, although I don't know what.   For example, I tried the following:

    SELECT testnull =

         CASE

          WHEN tmp.Rats_filename IS NULL         THEN 'NULL    '

          WHEN tmp.Rats_filename IS NOT NULL   THEN 'NOT NULL'

                                                                 ELSE 'OTHER'

         END

     , *

    FROM

         (SELECT *

                FROM UUG_APP_DVLP.dbo.[IIAFeedTransaction]   s

                      , UUG_APP_DVLP.dbo.[RATSIIAFeedTransaction]  o

                  WHERE S.IIATransactionId *= substring(rats_filename,1+ 

                     patindex ('%{________-____-____-____-

                                                  ____________}%',rats_filename),36)

           )TMP

    WHERE tmp.Rats_filename IS  NOT  NULL 

    ORDER BY 2

    For the above, there should be no problem with what to join together. The second WHERE should work properly, and I would expect it to work the same as the CASE statement. But it doesn't!!!. See results below:

    Testnull

    IIATransactionId

     

    RATS_FILENAME

    DATE_ENTERED

    NULL   

    9B33A776-408B-4928-AE2A-0FF43995DE12

    1

    NULL

    NULL

    NULL   

    8EEA51CE-D87A-4F76-B9C1-7AD11532D444

    0

    NULL

    NULL

    NULL   

    5D31FD56-7C0B-408B-83A5-864B7BD35ADC

    1

    NULL

    NULL

    NOT NULL

    365A0FD8-5042-4297-A082-8F5B11450AF4

    1

    CISF-{260021}-{365A0FD8-5042-4297-A082-8F5B11450AF4}-{7A050246-59F3-4E72-BFD2-40EB183B7D3C}

    2006-05-03 13:46:21.470

    NOT NULL

    16706611-C94D-4FBC-8F4E-9077C3B9E697

    1

    CISF-{260346}-{16706611-C94D-4FBC-8F4E-9077C3B9E697}-{73916523-EC71-4989-BE99-1EDB507D67DF}

    2006-05-03 13:46:21.470

    NOT NULL

    55DD6703-9693-45E2-A339-987066EA2864

    2

    RA-1-{259590}-{55DD6703-9693-45E2-A339-987066EA2864}-{EF41CFCE-20BA-4D2D-A4C8-121BFC7DAEE3}

    2006-05-03 13:46:21.470

    NULL   

    A721BD38-9C76-4C3A-A7C5-9CC0227F90F4

    0

    NULL

    NULL

    NOT NULL

    F4849137-6454-46E5-9811-A6985A189249

    1

    RA-2-{219960}-{F4849137-6454-46E5-9811-A6985A189249}-{F7267BD6-62CB-4379-8533-C413DD5EA402}

    2006-05-03 13:46:21.470

     

    Moral of this story is: Stay with the new syntax.

     

     

     

  • You might think "there should be no problem with what to join together" because you put parentheses around the derived table, but the query optimizer has its own agenda sometimes.  What did the execution plan look like?  You have got to give the query compiler credit, it's not easy to fool it with an extra level of parenthesis.

    A working hypothesis might be that in SQL Server once you use an old=style join then all WHERE clauses that refer to tables A or B (either singly or together) are interpreted as join conditions.  I don't know this to be true, but it explains the examples you've presented.  The difference in Oracle may be that it only uses conditions on both tables for joining, while conditions on only one table are used for filtering.

  • SELECT

     *

    FROM

     (

      SELECT

       'Naughty Optimizer; Real developers use parenthesis to clarify default behavior AND SPECIFY REQUIRED BEHAVIOR.' AS 'My two cents',

       'Using SELECT ColTitle = Expr is nice for swapping UPDATE for SELECT when Expr is a col name in UPDATE target, but this is legacy syntax' AS 'Farewell Old Friend'

      WHERE

       1 = 1

    &nbsp AS Moral_of_Story__Use_The_New_Syntax

  • I've always used the newer syntax as it made things obvious and clear to me...  I suppose if I had grown up using the old syntax I might think differently - but I also shudder to think of the difficulties I would've found trying to express what I needed in a query using the old syntax when the new syntax is just so easy and straightforward to both write, debug and explain to others.  Yay for new syntax - or, as far as I was concerned, the only syntax anyway

  • As with most things, it's a matter of habit.

    Things we know are 'easy', things we don't know are 'difficult'.

    There's no rocketscience about that.

    I started out writing T-SQL the old way, and switched to ANSI style sometime around y2k.

    At that time ANSI was hard, difficult, weird, strange, awkward... take your pick.

    Of course that was because in the beginning it was something that I didn't knew too well.

    But with practice comes perfect, and once you get used to it there are no regrets.

    ANSI is way more clear, concise and easy to write and moreover, to read. There's no doubt when

    you read a query what the author intended with it when it was written (assuming it was written correctly ofc)

    The major point is that there's no ambiguity for outer joins - a quite tricky area to 'get right'.

    I should add as a final note a plug for aliasing. Always alias everything. ANSI outer joins with aliases on all columns and tables is pretty much self-documenting code.

    /Kenneth

     

  • Doggone it. I was looking through Itzik's stuff for answer. I guess I didn't dig deep enough. Thanks for posting that.

    Itzik's one of the minor dieties around my shop ever since we spent a week with him doing advanced TSQL training.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • That's how I "solved" the problem when it was first presented to me. I couldn't read it in a way that made sense with the old syntax so I rewrote it and it was suddenly working.

    My Oracle-centric coworker held a gun to my head until I tried it several times both ways. It was just a hoot watching the data changing based on syntax only.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Amen on the aliasing brother.

    We've been reviewing some code from developers that not only didn't use aliasing, but used table names instead. Unfortunately they didn't even do that consistently so we end up with

    SELECT table.column, dbo.table.column, column

    FROM...

    We're getting out the hickory stick for the next meeting with this dev team...

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Hi,

    I had the same issue couple of days back and I solved it by same way (removing and clause and putting where clause).

    VS

  • There was a post that hit the nail on the head. Your where clause that stipulates that RAT_FILENAME IS NOT NULL does not limit results due to there being no NULL RAT_FILENAME columns... You want to look for NULLs after the join is complete.

    More to the point would be something like;

    /* return all IIAFeedTransaction.IIATransactionId values that are

    * not used in the RATSIIAFeedTransaction.RAT_FILENAME values

    */

    SELECT

    IIATransactionId

    FROM

    dbo.IIAFeedTransaction

    WHERE

    IIATransactionId IN (

    SELECT SubString(RATS_FILENAME,1+PatIndex('%{________-____-____-____-____________}%',RATS_FILENAME),36)

    FROM dbo.RATSIIAFeedTransaction

    )

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply