When if ever would you following a JOIN ON clause with an AND instead of WHERE?

  • KoldCoffee (10/19/2014)


    Hi Lynn, thanks for bearing with me.

    the persontype comes from Person.Person. I aliased. Can you revise the explanation?

    freakin' frustrating.

    This looks simple enough and I understand the sentences and logic, but when I look at output of Adventureworks sample I get confused.

    Think of it this way, the AND will satisfy all records on the right hand side and return NULL on the left hand side for those records not satisfying the condition, the WHERE will exclude the records on both sides of the join which have NULL on the left hand side.

    But this returns everything from the left side

    SELECT

    MT.MT_ID

    ,MT.MT_DATE

    ,MT.REF_ID

    ,RF.REF_VAL

    FROM MAIN_TABLE MT

    LEFT OUTER JOIN REF RF

    ON MT.REF_ID = RF.REF_ID

    AND RF.REF_ID < 4;

    And when I flip the tables I'm left joining on, this also only returns everything from the left side

    SELECT

    MT.MT_ID

    ,MT.MT_DATE

    ,MT.REF_ID

    ,RF.REF_VAL

    FROM REF RF

    LEFT OUTER JOIN MAIN_TABLE MT

    ON MT.REF_ID = RF.REF_ID

    AND RF.REF_ID < 4;

    the persontype comes from Person.Person.

    This is why all columns should be aliased in a multi-table query. Easy to know where each column comes from without having to see the DDL for the tables.

    Remember, in a left join, the table on the LEFT is the one in which ALL rows will be returned even when there is no matching row in the table on the right. The table on the RIGHT of the LEFT OUTER JOIN will only return rows that match the join criteria.

    Without having the database I can't really tell you much more without being able to actually run the code and variants to tell you what it is doing. I suggest that you play with the query and learn what it is doing.

  • My current latest understanding:

    In Left Outer JOIN everything is returned, period. That's because it's not an INNER JOIN but a LEFT OUTER.

    Next, if there is a WHERE clause, results are filtered. But if there is no WHERE clause then everything on the LEFT is returned, PERIOD.

    From the right are returned those things which satisfy the filtering criteria.

    Good morning in Afghanistan. I imagine it is shaping up to be another not too cold day. 🙂 Thanks Lynn and best.

  • KoldCoffee (10/19/2014)


    My current latest understanding:

    In Left Outer JOIN everything is returned, period. That's because it's not an INNER JOIN but a LEFT OUTER.

    Next, if there is a WHERE clause, results are filtered. But if there is no WHERE clause then everything on the LEFT is returned, PERIOD.

    From the right are returned those things which satisfy the filtering criteria.

    Good morning in Afghanistan. I imagine it is shaping up to be another not too cold day. 🙂 Thanks Lynn and best.

    In Left Outer JOIN everything is returned, period. That's because it's not an INNER JOIN but a LEFT OUTER.

    That is what I said above, all rows from the table on the left are returned.

    From the right are returned those things which satisfy the filtering criteria.

    Correct, except it isn't the filtering criteria, it's the join criteria. You may call this semantics, but just making sure that the correct terms are used. Filter criteria are in the WHERE clause as this reduces the result set returned from the join between tables.

    Actually, right now it is cold and wet having rained most of yesterday and into the night. I think the rain stopped around midnight, and it is now 08:30 (8:30 AM) Afghan time.

  • I guess you're not in Kabul, then.

    OK.

    so I incremented the JOIN condition in Eirikur's LEFT JOIN example and found that the entire left table is returned, whether the JOIN condition is <1, <2, <3, <4

    SELECT

    MT.MT_ID

    ,MT.MT_DATE

    ,MT.REF_ID

    ,RF.REF_VAL

    FROM MAIN_TABLE MT

    LEFT OUTER JOIN REF RF

    ON MT.REF_ID = RF.REF_ID

    AND RF.REF_ID < 4;

    The AND is joining to only those rows in the right table that satisfy the JOIN criteria.

    I think I understand finally.

    In the case of Adventurework, everything from Person.Person is being returned, and only the rows from Sales.SalesTerritoryHistory that satisfy the JOIN condition.

    and a WHERE would turn the LEFT OUTER JOIN back to an inner.

    Yes?? Did you pull me through and stick it out with me to the bitter end? I think so.

    Thank you Lynn. Have a wonderful day.

  • KoldCoffee (10/19/2014)


    ...I incremented the JOIN condition in Eirikur's LEFT JOIN example and found that the entire left table is returned, whether the JOIN condition is <1, <2, <3, <4

    SELECT

    MT.MT_ID

    ,MT.MT_DATE

    ,MT.REF_ID

    ,RF.REF_VAL

    FROM MAIN_TABLE MT

    LEFT OUTER JOIN REF RF

    ON MT.REF_ID = RF.REF_ID

    AND RF.REF_ID < 4;

    The AND is joining to only those rows in the right table that satisfy the JOIN criteria.

    I think I understand finally.

    Spot on;-)

    😎

  • If you are using a LEFT Join with a restricting WHERE then you need to be careful that your WHERE condition does not remove the additional rows your LEFT join was retrieving, that would defeat the purpose of using LEFT join..

    Remember , the order of processing is :

    1) FROM

    2) JOIN ON

    3) Add Outer rows (Only in case of Outer Joins)

    4) WHERE

    5) SELECT

    ON is processed first followed by WHERE( and in case of LEFT join, in fact any Outer Join for that matter,WHERE could remove the outer rows)

    So if you really want a LEFT Join, place that WHERE condtion in the JOIN ON clause.

    But check first what you actually want to do in your logic..

    "The price of anything is the amount of life you exchange for it" - Henry David Thoreau
  • .

    "The price of anything is the amount of life you exchange for it" - Henry David Thoreau
  • Eirikur Eiriksson (10/19/2014)


    On the contrary, it is very useful as it respects the left outer join, which in essence return all records from the right hand side and only those on the left hand side that satisfy the condition. Using a where clause will turn the join into a inner join, returning only those record combinations which satisfy the conditions. It is not an "output" filter, it's a join specification.

    😎

    Absolutely.

    It's that conversion of the JOIN from OUTER to INNER because of the placement of the filtering criteria that can really mess up data sets. Logically it makes sense too.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Eirikur Eiriksson (10/19/2014)


    On the contrary, it is very useful as it respects the left outer join, which in essence return all records from the right hand side and only those on the left hand side that satisfy the condition. Using a where clause will turn the join into a inner join, returning only those record combinations which satisfy the conditions. It is not an "output" filter, it's a join specification.

    😎

    I think you mean a left outer join returns all records from the left hand side and only those on the right hand side that satisfy the condition. And a right outer join returns all records from the right hand side and only those on the left hand side that satisfy the condition.

    I suppose an alternative to the method used by the original query would be:

    LEFT JOIN Sales.SalesTerritoryHistory st

    ON pp.BusinessEntityID = st.BusinessEntityID

    WHERE ((st.modifieddate between '2005-06-24 00:00:00.000' and '2006-06-24 00:00:00.000' AND persontype = 'sp') OR st.BusinessEntityID IS NULL)

    This gets the filter rules into the WHERE clause which you may find more readable, and the extra rule allowing st.BusinessEntityID to be NULL will allow the LEFT JOIN to continue behaving as expected.

  • KoldCoffee (10/19/2014)


    I guess you're not in Kabul, then.

    OK.

    ...

    I was in Kabul initially, but we got "kicked out" of ISAF back in November 2013. I am now at Bagram Airfield. Not sure how far that is from Kabul.

    Glad to be able to help.

  • KoldCoffee (10/18/2014)


    using Adventureworks2008R2 database I'd like to illustrate syntax I ran into and am unfamiliar with and I want to know if it's a mistake that SSMS doesn't catch.

    SELECT TOP 1000 p.[BusinessEntityID]

    ,[PersonType]

    ,[NameStyle]

    ,[Title]

    ,[FirstName]

    ,[MiddleName]

    ,[LastName]

    ,[Suffix]

    ,[EmailPromotion]

    ,[AdditionalContactInfo]

    ,st.ModifiedDate

    FROM [AdventureWorks2008R2].[Person].[Person] p

    JOIN Person.PersonPhone pp

    ON p.BusinessEntityID = pp.BusinessEntityID

    LEFT JOIN Sales.SalesTerritoryHistory st

    ON pp.BusinessEntityID = st.BusinessEntityID

    --at this point the JOIN ON clause is followed by AND instead of WHERE

    AND st.modifieddate between '2005-06-24 00:00:00.000' and '2006-06-24 00:00:00.000'

    AND persontype = 'sp'

    order by modifieddate desc

    The above, to my surprise, doesn't error. It also doesn't constrain the st.modifieddate from returning NULLS.

    When, if ever, would you follow a JOIN ON clause with an AND instead of WHERE? Is it exactly for the purpose of returning everything between the dates PLUS any modifieddate values having NULLS?

    The technical side of this question has been thoroughly answered here, I just wanted to touch on the practical application. Imagine you are creating a report for the sales manager of Adventureworks. This company sells bicycles and accessories. The manager wants to see a list of all the contact info for their sales persons who have a record in the salesterritoryhistory table. If they don't have a record in that table, the manager still wants to see the person on the report, i.e. they shouldn't be excluded.

    I don't have Adventureworks in front of me so I don't know what's in that salesterritoryhistory table, but perhaps it records when sales people change territory or if they are new and have a sales territory assigned to them. That would mean a report showing people who had changed territory in that year range, or null for that date if they stayed in the same territory. If you were running this on 2006-06-24, then the question might have been "Who are all the current sales people, and who is new or changed territory in the last year, and show that on the report as the date they started or changed territory".

    This same principle could be used to e.g. show people who had sales in the last month, including those who had no sales at all. Sales is a cliche example, but it applies to so many other domains.

  • davoscollective (10/20/2014)


    KoldCoffee (10/18/2014)


    using Adventureworks2008R2 database I'd like to illustrate syntax I ran into and am unfamiliar with and I want to know if it's a mistake that SSMS doesn't catch.

    SELECT TOP 1000 p.[BusinessEntityID]

    ,[PersonType]

    ,[NameStyle]

    ,[Title]

    ,[FirstName]

    ,[MiddleName]

    ,[LastName]

    ,[Suffix]

    ,[EmailPromotion]

    ,[AdditionalContactInfo]

    ,st.ModifiedDate

    FROM [AdventureWorks2008R2].[Person].[Person] p

    JOIN Person.PersonPhone pp

    ON p.BusinessEntityID = pp.BusinessEntityID

    LEFT JOIN Sales.SalesTerritoryHistory st

    ON pp.BusinessEntityID = st.BusinessEntityID

    --at this point the JOIN ON clause is followed by AND instead of WHERE

    AND st.modifieddate between '2005-06-24 00:00:00.000' and '2006-06-24 00:00:00.000'

    AND persontype = 'sp'

    order by modifieddate desc

    The above, to my surprise, doesn't error. It also doesn't constrain the st.modifieddate from returning NULLS.

    When, if ever, would you follow a JOIN ON clause with an AND instead of WHERE? Is it exactly for the purpose of returning everything between the dates PLUS any modifieddate values having NULLS?

    The technical side of this question has been thoroughly answered here, I just wanted to touch on the practical application. Imagine you are creating a report for the sales manager of Adventureworks. This company sells bicycles and accessories. The manager wants to see a list of all the contact info for their sales persons who have a record in the salesterritoryhistory table. If they don't have a record in that table, the manager still wants to see the person on the report, i.e. they shouldn't be excluded.

    I don't have Adventureworks in front of me so I don't know what's in that salesterritoryhistory table, but perhaps it records when sales people change territory or if they are new and have a sales territory assigned to them. That would mean a report showing people who had changed territory in that year range, or null for that date if they stayed in the same territory. If you were running this on 2006-06-24, then the question might have been "Who are all the current sales people, and who is new or changed territory in the last year, and show that on the report as the date they started or changed territory".

    This same principle could be used to e.g. show people who had sales in the last month, including those who had no sales at all. Sales is a cliche example, but it applies to so many other domains.

    The main concern might be a readability issue. Knowing the difference in behavior is something that could get less senior developers into trouble. While I personally like the AND notation, just because I know others will end up having to read and maintain my code, I tend to rewrite that to be clearer.

    Something like:

    SELECT TOP (1000) p.BusinessEntityID, PersonType, NameStyle, Title, FirstName, MiddleName, LastName, Suffix, EmailPromotion, AdditionalContactInfo,

    st.ModifiedDate

    FROM AdventureWorks2008R2.Person.Person AS p INNER JOIN

    Person.PersonPhone AS pp ON p.BusinessEntityID = pp.BusinessEntityID

    LEFT OUTER JOIN

    (

    select *

    from Sales.SalesTerritoryHistory

    where st.modifieddate between '2005-06-24 00:00:00.000' and '2006-06-24 00:00:00.000'

    AND persontype = 'sp'

    ) AS st ON pp.BusinessEntityID = st.BusinessEntityID

    order by modifieddate desc

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • thank you gward, davo collective, and matt miller for further clarification.

    This experience has led me with conviction to understand that in the scenario where I was seeing the LEFT JOIN with AND, it was actually an error. I could see the error but I was tremendously bothered by not understanding this syntax and applicability.

    Thanks to everyone who helped me come up to speed on this advanced join type ....well I do think it is. It's beguiling, at the very least.

  • Absolutely beguiling. I think everyone knows it so well because we've all been bitten by it at one point or another.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • To me it looks like someone was looking to add the code persontype = 'sp', but just put it in the wrong place. I've seen code like that before, when you tell a new person to just add that code to the join and they put it on the wrong one. Doesn't fail, but sure was slower.

    My thoughts that it should be more written like one of these, depending on the results needed as other have stated.

    SELECT TOP 1000 p.BusinessEntityID,

    PersonType,

    NameStyle,

    Title,

    FirstName,

    MiddleName,

    LastName,

    Suffix,

    EmailPromotion,

    AdditionalContactInfo,

    st.ModifiedDate

    FROM AdventureWorks2008R2.Person.Person p

    JOIN Person.PersonPhone pp ON

    p.BusinessEntityID = pp.BusinessEntityID

    AND p.persontype = 'sp' -- moved to the correct join

    LEFT JOIN Sales.SalesTerritoryHistory st ON

    pp.BusinessEntityID = st.BusinessEntityID

    AND st.modifieddate BETWEEN '2005-06-24 00:00:00.000' AND '2006-06-24 00:00:00.000'

    ORDER BY modifieddate DESC

    -- OR

    SELECT TOP 1000 p.BusinessEntityID,

    PersonType,

    NameStyle,

    Title,

    FirstName,

    MiddleName,

    LastName,

    Suffix,

    EmailPromotion,

    AdditionalContactInfo,

    st.ModifiedDate

    FROM AdventureWorks2008R2.Person.Person p

    JOIN Person.PersonPhone pp ON

    p.BusinessEntityID = pp.BusinessEntityID

    LEFT JOIN Sales.SalesTerritoryHistory st ON

    pp.BusinessEntityID = st.BusinessEntityID

    AND st.modifieddate BETWEEN '2005-06-24 00:00:00.000' AND '2006-06-24 00:00:00.000'

    WHERE p.persontype = 'sp' -- or moved to the WHERE

    ORDER BY modifieddate DESC

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply