When if ever would you following a JOIN ON clause with an AND instead of WHERE?

  • using Adventureworks2008R2 database I'd like to illustrate syntax I ran into and am unfamiliar with and I want to know if it's a mistake that SSMS doesn't catch.

    SELECT TOP 1000 p.[BusinessEntityID]

    ,[PersonType]

    ,[NameStyle]

    ,[Title]

    ,[FirstName]

    ,[MiddleName]

    ,[LastName]

    ,[Suffix]

    ,[EmailPromotion]

    ,[AdditionalContactInfo]

    ,st.ModifiedDate

    FROM [AdventureWorks2008R2].[Person].[Person] p

    JOIN Person.PersonPhone pp

    ON p.BusinessEntityID = pp.BusinessEntityID

    LEFT JOIN Sales.SalesTerritoryHistory st

    ON pp.BusinessEntityID = st.BusinessEntityID

    --at this point the JOIN ON clause is followed by AND instead of WHERE

    AND st.modifieddate between '2005-06-24 00:00:00.000' and '2006-06-24 00:00:00.000'

    AND persontype = 'sp'

    order by modifieddate desc

    The above, to my surprise, doesn't error. It also doesn't constrain the st.modifieddate from returning NULLS.

    When, if ever, would you follow a JOIN ON clause with an AND instead of WHERE? Is it exactly for the purpose of returning everything between the dates PLUS any modifieddate values having NULLS?

  • Quick thought, conditioning a left outer join query on a column on the left side of the join effectively turns the join into an inner join, when the same predicates are part of the join specification it remains as a left outer join.

    😎

    Quick demo

    USE tempdb;

    GO

    DECLARE @MAIN_TABLE TABLE

    (

    MT_ID INT IDENTITY(1,1) NOT NULL

    ,MT_DATE DATE NOT NULL

    ,REF_ID INT NULL

    );

    DECLARE @REF TABLE

    (

    REF_ID INT NOT NULL

    ,REF_VAL VARCHAR(12) NOT NULL

    );

    INSERT INTO @MAIN_TABLE (MT_DATE,REF_ID)

    VALUES

    ('2014/01/01',1)

    ,('2014/02/01',1)

    ,('2014/03/01',2)

    ,('2014/04/01',NULL)

    ,('2014/05/01',3)

    ,('2014/06/01',NULL);

    INSERT INTO @REF(REF_ID,REF_VAL)

    VALUES

    (1,'REF 1')

    ,(2,'REF 2')

    ,(3,'REF 3');

    /* Left outer join turned to inner join */

    SELECT

    MT.MT_ID

    ,MT.MT_DATE

    ,MT.REF_ID

    ,RF.REF_VAL

    FROM @MAIN_TABLE MT

    LEFT OUTER JOIN @REF RF

    ON MT.REF_ID = RF.REF_ID

    WHERE RF.REF_ID < 4;

    /* Left outer join "untouched" */

    SELECT

    MT.MT_ID

    ,MT.MT_DATE

    ,MT.REF_ID

    ,RF.REF_VAL

    FROM @MAIN_TABLE MT

    LEFT OUTER JOIN @REF RF

    ON MT.REF_ID = RF.REF_ID

    AND RF.REF_ID < 4;

    Results 1

    MT_ID MT_DATE REF_ID REF_VAL

    ----------- ---------- ----------- ------------

    1 2014-01-01 1 REF 1

    2 2014-02-01 1 REF 1

    3 2014-03-01 2 REF 2

    5 2014-05-01 3 REF 3

    Results 2

    MT_ID MT_DATE REF_ID REF_VAL

    ----------- ---------- ----------- ------------

    1 2014-01-01 1 REF 1

    2 2014-02-01 1 REF 1

    3 2014-03-01 2 REF 2

    4 2014-04-01 NULL NULL

    5 2014-05-01 3 REF 3

    6 2014-06-01 NULL NULL

  • First is the LEFT JOIN as described above, but second, compound primary keys. The JOIN criteria could simply require multiple columns. I've seen lots and lots of uses for multiple sets of JOIN criteria because of these two requirements.

    I've also seen a few edge cases, in queries that were actually quite a mess and very problematic anyway, where moving the filtering criteria from the WHERE clause to the JOIN clause resulted in better execution plans (again, serious edge cases, in most circumstances this wouldn't work at all) for INNER JOINs.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The best thing you can do is see what is the difference between the query you posted and one where you believe the WHERE clause should begin.

    I don't have the sample database available out here in Afghanistan so you will have try these two queries and report the differences:

    SELECT TOP 1000

    p.[BusinessEntityID],

    [PersonType],

    [NameStyle],

    [Title],

    [FirstName],

    [MiddleName],

    [LastName],

    [Suffix],

    [EmailPromotion],

    [AdditionalContactInfo],

    st.ModifiedDate

    FROM

    [AdventureWorks2008R2].[Person].[Person] p

    INNER JOIN Person.PersonPhone pp

    ON p.BusinessEntityID = pp.BusinessEntityID

    LEFT OUTER JOIN Sales.SalesTerritoryHistory st

    ON pp.BusinessEntityID = st.BusinessEntityID

    --at this point the JOIN ON clause is followed by AND instead of WHERE

    AND st.modifieddate between '2005-06-24 00:00:00.000' and '2006-06-24 00:00:00.000'

    AND persontype = 'sp'

    order by

    modifieddate desc;

    GO

    SELECT TOP 1000

    p.[BusinessEntityID],

    [PersonType],

    [NameStyle],

    [Title],

    [FirstName],

    [MiddleName],

    [LastName],

    [Suffix],

    [EmailPromotion],

    [AdditionalContactInfo],

    st.ModifiedDate

    FROM

    [AdventureWorks2008R2].[Person].[Person] p

    INNER JOIN Person.PersonPhone pp

    ON p.BusinessEntityID = pp.BusinessEntityID

    LEFT OUTER JOIN Sales.SalesTerritoryHistory st

    ON pp.BusinessEntityID = st.BusinessEntityID

    WHERE

    st.modifieddate between '2005-06-24 00:00:00.000' and '2006-06-24 00:00:00.000'

    AND persontype = 'sp'

    order by

    modifieddate desc;

    GO

    I fully expect you to see a difference in the result sets returned by the two queries.

  • in Eirikur's example the LEFT JOIN with the AND returns everything.

    In the Adventureworks example:

    The LEFT JOIN with the WHERE returns 9 records.

    The LEFT JOIN with the AND returns 19973 rows.

    Total possible records 19975.

    I don't know how to find the two records that are not returned with the LEFT JOIN with AND. I exported to excel and concatenated names but there are many duplicate names. Any ideas on how I may figure out the difference between what AND returns and the total result set?

    I don't understand the use of AND <some filter requirement> following a JOIN. It seems a mistake. In the case of my example, all person types are returned, even though 'sp' is specified.

  • KoldCoffee (10/19/2014)


    in Eirikur's example the LEFT JOIN with the AND returns everything.

    In the Adventureworks example:

    The LEFT JOIN with the WHERE returns 9 records.

    The LEFT JOIN with the AND returns 19973 rows.

    Total possible records 19975.

    I don't know how to find the two records that are not returned with the LEFT JOIN with AND. I exported to excel and concatenated names but there are many duplicate names. Any ideas on how I may figure out the difference between what AND returns and the total result set?

    I don't understand the use of AND <some filter requirement> following a JOIN. It seems a mistake. In the case of my example, all person types are returned, even though 'sp' is specified.

    Can't really help as I don't have the database available to me out here in Afghanistan. As for seeing the differences in results sets try using the EXCEPT operator between the two queries. You will need to flip the queries depending on which one you want the records from where they don't exist in the other.

  • Think of it this way, the AND will satisfy all records on the right hand side and return NULL on the left hand side for those records not satisfying the condition, the WHERE will exclude the records on both sides of the join which have NULL on the left hand side.

    😎

  • OK, that's what it looked like but it appears *not useful*. Why use a filter, only to use AND in front of it, and make the usage of a filter misleading?

    ....everything from Right side PLUS all NULLS on left? I fail to see application for this.

  • On the contrary, it is very useful as it respects the left outer join, which in essence return all records from the right hand side and only those on the left hand side that satisfy the condition. Using a where clause will turn the join into a inner join, returning only those record combinations which satisfy the conditions. It is not an "output" filter, it's a join specification.

    😎

  • Eirikur Eiriksson (10/19/2014)


    On the contrary, it is very useful as it respects the left outer join, which in essence return all records from the right hand side and only those on the left hand side that satisfy the condition. Using a where clause will turn the join into a inner join, returning only those record combinations which satisfy the conditions. It is not an "output" filter, it's a join specification.

    😎

    Very well said.

  • (1) the LEFT JOIN to Person.Person table returns everything from Person.Person that satisfies the JOIN condition (p.BusinessEntityID = st.BusinessEntityID) and the filters for persontype and dates.

    (2) the AND usage retrieves all remaining records from Person.Person after satisfying the requirements of the filters?

  • nope, it literally returns all records from the right hand side regardless of join or filter, right?

  • Lynn, when I try the EXCEPT I get error:

    "

    Msg 421, Level 16, State 1, Line 3

    The xml data type cannot be selected as DISTINCT because it is not comparable.

    "

    [/

    SELECT

    p.[BusinessEntityID],

    [PersonType],

    [NameStyle],

    [Title],

    [FirstName],

    [MiddleName],

    [LastName],

    [Suffix],

    [EmailPromotion],

    [AdditionalContactInfo],

    st.ModifiedDate

    FROM

    [AdventureWorks2008R2].[Person].[Person] p

    INNER JOIN Person.PersonPhone pp

    ON p.BusinessEntityID = pp.BusinessEntityID

    LEFT OUTER JOIN Sales.SalesTerritoryHistory st

    ON pp.BusinessEntityID = st.BusinessEntityID

    --at this point the JOIN ON clause is followed by AND instead of WHERE

    AND st.modifieddate between '2005-06-24 00:00:00.000' and '2006-06-24 00:00:00.000'

    AND persontype = 'sp'

    except

    SELECT

    p.[BusinessEntityID],

    [PersonType],

    [NameStyle],

    [Title],

    [FirstName],

    [MiddleName],

    [LastName],

    [Suffix],

    [EmailPromotion],

    [AdditionalContactInfo],

    st.ModifiedDate

    FROM

    [AdventureWorks2008R2].[Person].[Person] p

    INNER JOIN Person.PersonPhone pp

    ON p.BusinessEntityID = pp.BusinessEntityID

    LEFT OUTER JOIN Sales.SalesTerritoryHistory st

    ON pp.BusinessEntityID = st.BusinessEntityID

    code]

  • Hope this helps:

    ...

    FROM

    [AdventureWorks2008R2].[Person].[Person] p

    INNER JOIN Person.PersonPhone pp

    ON p.BusinessEntityID = pp.BusinessEntityID -- << Return all rows from Person.Person and Person.PersonPhone

    -- where EntityID's match

    LEFT OUTER JOIN Sales.SalesTerritoryHistory st

    ON pp.BusinessEntityID = st.BusinessEntityID

    AND st.modifieddate between '2005-06-24 00:00:00.000' and '2006-06-24 00:00:00.000'

    AND persontype = 'sp' -- << First assumpation since I don't have the database available

    -- is that persontype is a column in Sales.SalesTerritoryHistory

    -- This returns all rows from the above INNER JOIN plus the rows

    -- of data from Sales.SalesTerritoryHistory where BusunessEntityIDs match

    -- and modifieddate in Sales.SalesTerritoryHistory is between

    -- '2005-06-24 00:00:00.000' and '2006-06-24 00:00:00.000' and

    -- persontype = 'sp', nulls where there is no match

    ...

  • Hi Lynn, thanks for bearing with me.

    the persontype comes from Person.Person. I aliased. Can you revise the explanation?

    freakin' frustrating.

    This looks simple enough and I understand the sentences and logic, but when I look at output of Adventureworks sample I get confused.

    Think of it this way, the AND will satisfy all records on the right hand side and return NULL on the left hand side for those records not satisfying the condition, the WHERE will exclude the records on both sides of the join which have NULL on the left hand side.

    But this returns everything from the left side

    SELECT

    MT.MT_ID

    ,MT.MT_DATE

    ,MT.REF_ID

    ,RF.REF_VAL

    FROM MAIN_TABLE MT

    LEFT OUTER JOIN REF RF

    ON MT.REF_ID = RF.REF_ID

    AND RF.REF_ID < 4;

    And when I flip the tables I'm left joining on, this also only returns everything from the left side

    SELECT

    MT.MT_ID

    ,MT.MT_DATE

    ,MT.REF_ID

    ,RF.REF_VAL

    FROM REF RF

    LEFT OUTER JOIN MAIN_TABLE MT

    ON MT.REF_ID = RF.REF_ID

    AND RF.REF_ID < 4;

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply