Outer Join Mystery

  • Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/gFritchey/outerjoinmystery.asp

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • This was really good one. Request you to simplify naming convention and data taken in future articles.

  • The explanation is really simple:

    With the old syntax it's impossible to determine what part of the where statement is used to join the tables and which part is used to filter the records, so the database server has to guess.

    With the proper syntax the part of the join which is used to join the table is after the ON and the part to filter the records is after the WHERE.

    Hope it makes sense,

    Auke

  • So it actually reads the old statement as

    SELECT

     S.IIATransactionId,

     substring(rats_filename, 1+patindex('%{________-____-____-____-____________}%',  rats_filename), 36) AS OracleTransactionId

    FROM

     iiafeedtransaction S

    LEFT OUTER JOIN

     ratsiiafeedtransaction o

    ON

     S.IIATransactionId = substring(rats_filename,   1+patindex('%{________-____-____-____-____________}%', rats_filename), 36)

    and

     o.Rats_filename IS  NULL

    ORDER BY

     S.IIATransactionId

  • Actually, your logic is incorrect.
    What are you basing your where statement on? It is where rats_filename is null.
    Look at your ratsiiafeedtransaction table again.. there are no null fields in this table.
    so something like the below (and this is quick and nasty) works:
     
    select a, oracletransactionid
    from (
    SELECT S.IIATransactionId as a, substring(o.rats_filename,1+patindex('%{________-____-____-____-____________}%',rats_filename),36) AS OracleTransactionId
    FROM iiafeedtransaction S, ratsiiafeedtransaction o
    WHERE S.IIATransactionId *=
    substring(o.rats_filename,1+patindex('%{________-____-____-____-____________}%',o.rats_filename),36)
    ) data
    where oracletransactionid is null
    order by a
     
    FYI: SQL 2005 does not allow this style of code by default, you have to override the database to do it. And it is claimed that future version wont support it at all.  even in backward compat. mode.
     

  • Here's the official info from BOL:

     

    Transact-SQL Joins

    In earlier versions of Microsoft® SQL Server™ 2000, left and right outer join conditions were specified in the WHERE clause using the *= and =* operators. In some cases, this syntax results in an ambiguous query that can be interpreted in more than one way. SQL-92 compliant outer joins are specified in the FROM clause and do not result in this ambiguity. Because the SQL-92 syntax is more precise, detailed information about using the old Transact-SQL outer join syntax in the WHERE clause is not included with this release. The syntax may not be supported in a future version of SQL Server. Any statements using the Transact-SQL outer joins should be changed to use the SQL-92 syntax.

    The SQL-92 standard does support the specification of inner joins in either the FROM or WHERE clause. Inner joins specified in the WHERE clause do not have the same problems with ambiguity as the Transact-SQL outer join syntax.

    Best Regards,

    Chris Büttner

  • As said above, this is an old 'caveat' with the legacy outer join syntax compared to ANSI joining, that has been around since day one ANSI joins was available.

    The primary difference lies in how the query is resolved.

    The legacy syntax has only one WHERE clause to put both join criteria and filtering criteria, where with ANSI syntax, you have a distinct section in the ON clause for the actual join, and the filtering goes into the WHERE clause.

    This difference makes such questions as 'what rows exists in table A but not in table B' simply impossible to write correctly with the old legacy *= syntax, since what we really ask for in these cases are nulls that are a result of the join itself.

    There are many reasons to switch habits from legacy join syntax to ANSI style, these kinds of queries is one of them.

    /Kenneth

     

  • Sorry about that. This article wrote itself since I just copied & pasted the code. Usually I'd write the code for an article seperately. Again, sorry for the confusion.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Couldn't agree more. I personally haven't written queries using the old syntax for something like 10 years. I had just assumed during all that time, that those people, either obstinant or ignorant, writing in the old syntax were getting good data from their queries. This doesn't even address the fact that with the current syntax we can do things that could never have been done in the old:

    SELECT ....

    FROM Version V

    INNER JOIN x

    ON x.Id = V.Id

    INNER JOIN y

    ON x.Id = y.Id

    AND y.Version = (SELECT MAX(Version)

    FROM Y AS y2

    WHERE y2.VersionId <= v.VersionId

    AND y2.Id = y.Id)

    This little dive through the old methods was an eye opener that I thought I'd share just in case others were as uninformed. There really are still a lot of people who are using the old approach.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Because of the left join, you would get values that are null, despite the fact that there were no null values in the table. It's not dissimilar to the query you wrote to determine what's in one table, but not the other. When you join with the proper syntax it works just fine. That was the surprise.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • True and correct.  I just put in the example code for the old style. I had actually noticed the execution plans were pretty much identical.

    How people can still write in that scares me

  • A lot of it, unfortunately not all of it, comes from Oracle people. But even Oracle is starting down the path of eliminating the old style syntax. If you go out & search on the Oracle blogs & newsgroups, it's a big deal that lots of them are very unhappy about.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Hi,

    This article & discussion is very interesting and it helped me a lot in understanding the Outer Join Query plus the old vs. new syntax.

    Thanks once again

    Regards

    Amit Gurjar

  • To get the Execution plans to match between the old and new syntax, the "old style" join query needs to look something like this:

    SELECT

    S.IIATransactionId

    ,substring(rats_filename,1+patindex('%{________-____-____-____-____________}%',rats_filename),36) AS OracleTransactionId

    FROM iiafeedtransaction S,

    ratsiiafeedtransaction o

    WHERE S.IIATransactionId *= substring(rats_filename,1+patindex('%{________-____-____-____-____________}%',rats_filename),36)

    GROUP BY

    S.IIATransactionId,substring(rats_filename,1+patindex('%{________-____-____-____-____________}%',rats_filename),36)

    HAVING substring(rats_filename,1+patindex('%{________-____-____-____-____________}%',rats_filename),36) IS NULL

    ORDER BY S.IIATransactionId

    Using the GROUP BY and the HAVING clauses puts the filter back into the execution plan and the correct results are seen.

    Far easier to just use the new style of joins

    Regards,

    Mark Horton.

  • Because you dont have control over the order of WHERE clause execution, the WHERE condition can occur before the JOIN condition, meaning that the WHERE occurs on the *Cartesian Product* of the two tables, and therefore doesnt filter rows properly.

    If you think about this closely enough, it makes sense.

     

     

Viewing 15 posts - 1 through 15 (of 39 total)

You must be logged in to reply to this topic. Login to reply