Restricted LEFT JOIN

  • I have a situation where I need to identify all appropriate records in my main table and only appropriate records in my LEFT JOIN table where they match those in a third table without letting the third table affect my primary result set.

    To help explain, here is a simplified code snippit from a health care environment:DECLARE @RptPeriodStart DATETIME, @RptPeriodStop DATETIME;

    CREATE TABLE #Element (ElementId BIGINT, Concept VARCHAR(16), StartDateTime DATETIME, StopDateTime DATETIME, StdTaxonomy VARCHAR(16), StdTerm VARCHAR(50));

    CREATE TABLE #Property (PropertyId BIGINT, ElementId BIGINT, PropertyName VARCHAR(16), StartDateTime DATETIME, StopDateTime DATETIME, StdTaxonomy VARCHAR(16), StdTerm VARCHAR(50));

    CREATE TABLE #ValueSet (ValueSetId BIGINT, ValueSetName VARCHAR(50), Description VARCHAR(256), StdTaxonomy VARCHAR(16), StdTerm VARCHAR(50));

    SELECT e.ElementId, e.StartDateTime, e.StopDateTime

    FROM #Element e

    LEFT JOIN #Property p

    ON p.PropertyName = 'FacilityLocation'

    AND p.ElementId = e.ElementId

    INNER JOIN #ValueSet vs

    ON vs.ValueSetName = 'ED'

    AND vs.StdTaxonomy = p.StdTaxonomy

    AND vs.StdTerm = p.StdTerm

    WHERE e.Concept = 'encounter'

    AND (e.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop

    OR p.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop);

    In this query, I want to identify encounters that end within the reporting period or their associated ED Visit ends within the reporting period. The ED Visit is identified by the property 'FacilityLocation' but I only want those locations identified as ED according to the given value set (ignoring others such as ICU). Therefore, I need the table #ValueSet to restrict the selection of #Property without affecting #Element.

    If the join to #ValueSet is made with a LEFT JOIN, then I do indeed get all records from #Element, but I also get those from #Property whether they're in the value set or not. If I use an INNER JOIN as shown, then I only get records from #Element that have an ED Visit and I don't get those without an ED Visit.

    Due to the data volume in the tables, if I do a LEFT JOIN to a sub-query of #Property which is an INNER JOIN to #ValueSet [basically LEFT JOIN (SELECT FROM #Property INNER JOIN #ValueSet)] then performance suffers tremendously because it forces a look at ALL #Property records and not just the ones for the required #Element records.

    This is really driving me nuts!! Is there a way to do this effectively without using a correlated sub-query?

  • Very difficult for me to follow precisely without some sample data and expected results to work with, but have you tried EXISTS and/or NOT EXISTS?? Sounds like one or both could be helpful here. The great thing about EXISTS is that it will "short circuit" and stop processing once it finds the first "hit". I have used that about a bajillion times at clients to replace counting all records and then verifying count > 0. 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • There are two issues here which might be causing some confusion.

    Firstly, the FROM list of your query, which looks like

    SELECT ...

    FROM parent

    LEFT JOIN child to parent

    INNER JOIN grandchild to child

    SQL Server converts the LEFT JOIN to an INNER JOIN. See this post for more details: http://www.sqlservercentral.com/Forums/FindPost1583604.aspx

    Secondly, if you filter a column of a left-joined table in the WHERE clause without accounting for nulls (where’s no match on the RHS for a row on the LHS), you’re converting the join to an INNER JOIN.

    As Kevin points out, it’s difficult to figure out what you are trying to do without sample data and expected results, but I think it might be something like this:

    SELECT e.ElementId, e.StartDateTime, e.StopDateTime

    FROM #Element e

    LEFT JOIN (

    SELECT p.ElementId

    FROM #Property p

    INNER JOIN #ValueSet vs

    ON vs.ValueSetName = 'ED'

    AND vs.StdTaxonomy = p.StdTaxonomy

    AND vs.StdTerm = p.StdTerm

    WHERE p.PropertyName = 'FacilityLocation'

    AND p.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop

    ) pvs ON pvs.ElementId = e.ElementId

    WHERE e.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop

    OR pvs.ElementId IS NOT NULL

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Aaron N. Cutshall (1/12/2015)


    If the join to #ValueSet is made with a LEFT JOIN, then I do indeed get all records from #Element, but I also get those from #Property whether they're in the value set or not. If I use an INNER JOIN as shown, then I only get records from #Element that have an ED Visit and I don't get those without an ED Visit.

    OK, may be just filter them in WHERE ?DECLARE @RptPeriodStart DATETIME, @RptPeriodStop DATETIME;

    CREATE TABLE #Element (ElementId BIGINT, Concept VARCHAR(16), StartDateTime DATETIME, StopDateTime DATETIME, StdTaxonomy VARCHAR(16), StdTerm VARCHAR(50));

    CREATE TABLE #Property (PropertyId BIGINT, ElementId BIGINT, PropertyName VARCHAR(16), StartDateTime DATETIME, StopDateTime DATETIME, StdTaxonomy VARCHAR(16), StdTerm VARCHAR(50));

    CREATE TABLE #ValueSet (ValueSetId BIGINT, ValueSetName VARCHAR(50), Description VARCHAR(256), StdTaxonomy VARCHAR(16), StdTerm VARCHAR(50));

    SELECT e.ElementId, e.StartDateTime, e.StopDateTime

    FROM #Element e

    LEFT JOIN #Property p

    ON p.PropertyName = 'FacilityLocation'

    AND p.ElementId = e.ElementId

    LEFT JOIN #ValueSet vs

    ON vs.ValueSetName = 'ED'

    AND vs.StdTaxonomy = p.StdTaxonomy

    AND vs.StdTerm = p.StdTerm

    WHERE e.Concept = 'encounter'

    AND (e.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop

    OR p.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop)

    -- no real Property without ValueSet

    AND (p.ID IS NULL OR vs.ID IS NOT NULL);

  • ChrisM@Work (1/13/2015)


    There are two issues here which might be causing some confusion.

    Firstly, the FROM list of your query, which looks like

    SELECT ...

    FROM parent

    LEFT JOIN child to parent

    INNER JOIN grandchild to child

    SQL Server converts the LEFT JOIN to an INNER JOIN. See this post for more details: http://www.sqlservercentral.com/Forums/FindPost1583604.aspx

    Chris,

    I had great difficulty in simplifying the situation and generating data would have let to greater confusion rather than clarification, so I created a different example that is not as complex but I think still serves the purpose:

    SELECT A.Item, B.Descr, C.Color

    FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A

    LEFT JOIN (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B

    ON B.Item = A.Item

    INNER JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C

    ON C.Color = B.Color;

    As expected, I only received two lines as the LEFT JOIN was converted to an INNER JOIN:

    ItemDescrColor

    2Red 2Red

    4Blue 4Blue

    Changing the INNER JOIN to a LEFT JOIN did provide all 5 lines but allowed a child record not in the color value set (green):

    ItemDescrColor

    1NULLNULL

    2Red 2Red

    3Green 3NULL

    4Blue 4Blue

    5NULLNULL

    Then I modified the query based upon your posting:

    SELECT A.Item, B.Descr, C.Color

    FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A

    LEFT JOIN (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B

    INNER JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C

    ON C.Color = B.Color

    ON B.Item = A.Item;

    And it worked!!

    ItemDescrColor

    1NULLNULL

    2Red 2Red

    3NULLNULL

    4Blue 4Blue

    5NULLNULL

    I have to admit that I don't really understand how it works syntactically. Could you clarify?

    Aaron

  • serg-52 (1/13/2015)


    OK, may be just filter them in WHERE ?

    Unfortunately, that would filter out rows, not columns.

  • Aaron N. Cutshall (1/13/2015)


    serg-52 (1/13/2015)


    OK, may be just filter them in WHERE ?

    Unfortunately, that would filter out rows, not columns.

    Yes, my understanding was you complaint just about excessive rows. If you need just set column to NULL then

    SELECT A.Item

    -- when key of C is NULL ignore B data

    , CASE WHEN C.Color IS NOT NULL THEN B.Descr END Descr

    , C.Color

    FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A

    LEFT JOIN (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B

    ON B.Item = A.Item

    LEFT JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C

    ON C.Color = B.Color

    While it gives the same results as the one by Chris it may have different execution plan. If

    Due to the data volume in the tables, if I do a LEFT JOIN to a sub-query of #Property which is an INNER JOIN to #ValueSet [basically LEFT JOIN (SELECT FROM #Property INNER JOIN #ValueSet)] then performance suffers tremendously because it forces a look at ALL #Property records and not just the ones for the required #Element records.

    then i suggest you choose one of the scripts based on their perfomance under your environment.

  • Aaron N. Cutshall (1/13/2015)


    ChrisM@Work (1/13/2015)


    There are two issues here which might be causing some confusion.

    Firstly, the FROM list of your query, which looks like

    SELECT ...

    FROM parent

    LEFT JOIN child to parent

    INNER JOIN grandchild to child

    SQL Server converts the LEFT JOIN to an INNER JOIN. See this post for more details: http://www.sqlservercentral.com/Forums/FindPost1583604.aspx

    Chris,

    I had great difficulty in simplifying the situation and generating data would have let to greater confusion rather than clarification, so I created a different example that is not as complex but I think still serves the purpose:

    SELECT A.Item, B.Descr, C.Color

    FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A

    LEFT JOIN (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B

    ON B.Item = A.Item

    INNER JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C

    ON C.Color = B.Color;

    As expected, I only received two lines as the LEFT JOIN was converted to an INNER JOIN:

    ItemDescrColor

    2Red 2Red

    4Blue 4Blue

    Changing the INNER JOIN to a LEFT JOIN did provide all 5 lines but allowed a child record not in the color value set (green):

    ItemDescrColor

    1NULLNULL

    2Red 2Red

    3Green 3NULL

    4Blue 4Blue

    5NULLNULL

    Then I modified the query based upon your posting:

    SELECT A.Item, B.Descr, C.Color

    FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A

    LEFT JOIN (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B

    INNER JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C

    ON C.Color = B.Color

    ON B.Item = A.Item;

    And it worked!!

    ItemDescrColor

    1NULLNULL

    2Red 2Red

    3NULLNULL

    4Blue 4Blue

    5NULLNULL

    I have to admit that I don't really understand how it works syntactically. Could you clarify?

    Aaron

    Hi Aaron, I haven't come across an online source which describes and accounts for this behaviour - yet - but you may find some tips in this thread.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • BOL says,

    <joined_table> ::=

    {

    <table_source> <join_type> <table_source> ON <search_condition>

    | <table_source> CROSS JOIN <table_source>

    | left_table_source { CROSS | OUTER } APPLY right_table_source

    | [ ( ] <joined_table> [ ) ]

    }

    <join_type> ::=

    [ { INNER | { { LEFT | RIGHT | FULL } [ OUTER ] } } [ <join_hint> ] ]

    JOIN

    http://msdn.microsoft.com/en-us/library/ms177634.aspx

    Which means the code between JOIN and ON must be a <table_source>.

    It leaves no way to interpret the query but as it is

    SELECT A.Item, B.Descr, C.Color

    FROM

    (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A

    LEFT JOIN

    (

    (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B

    INNER JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C

    ON C.Color = B.Color

    )

    ON B.Item = A.Item;

  • Serg,

    You are correct that it essentially transforms the INNER JOIN into a sub-query.

    SELECT A.Item, B.Descr, C.Color

    FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A

    LEFT JOIN (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B

    INNER JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C

    ON C.Color = B.Color

    ON B.Item = A.Item;

    SELECT A.Item, B.Descr, B.Color

    FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A

    LEFT JOIN (SELECT B.Item, B.Descr, C.Color

    FROM (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B

    INNER JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C

    ON C.Color = B.Color

    ) AS B

    ON B.Item = A.Item;

    When I look at the execution plan for the above two queries, they were nearly identical with the sub-query having only one additional step. I abandoned that approach due to the performance hit I encountered and will have to abandon this solution as well for the same reason.

    Another alternative is to tweak my environment to allow something like this one:

    SELECT A.Item, B.Descr, B.Color

    FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A

    LEFT JOIN (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B

    ON B.Item = A.Item

    AND B.Color IN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color);

    That will be a bit difficult as my true scenario has not one value to be validated but a pair of values and it doesn't lend itself easily to this approach.

    Thanks to everyone who contributed and helped me to work out this difficult issue.

  • So, have you actually solved your problem yet with Chris' solution, or is there still an issue with what you are trying to do?

  • Chris' solution works well in environments with low to medium data volume, but in the very high data volume environment I have it's too slow. I'm considering other alternatives.

  • I would not have thought the slowness would be due to the derived table he used to enable your joins. I would think its more likely a question of what indexes you have or of the OR statement in the where clause resulting in the wrong indexes being used.

    Out of curiosity, is the following faster, slower, or about the same (replaces the OR with UNION ALLs)? That is assuming I got the logic right, no data to test against ...

    ;

    WITH pvs AS (

    SELECT p.ElementId

    FROM #Property p

    INNER JOIN #ValueSet vs

    ON vs.ValueSetName = 'ED'

    AND vs.StdTaxonomy = p.StdTaxonomy

    AND vs.StdTerm = p.StdTerm

    WHERE p.PropertyName = 'FacilityLocation'

    AND p.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop

    )

    SELECT e.ElementId, e.StartDateTime, e.StopDateTime

    FROM #Element e

    LEFT JOIN pvs ON pvs.ElementId = e.ElementId

    WHERE e.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop

    UNION ALL

    SELECT e.ElementId, e.StartDateTime, e.StopDateTime

    FROM #Element e

    JOIN pvs ON pvs.ElementId = e.ElementId

    WHERE e.StopDateTime < @RptPeriodStart

    UNION ALL

    SELECT e.ElementId, e.StartDateTime, e.StopDateTime

    FROM #Element e

    JOIN pvs ON pvs.ElementId = e.ElementId

    WHERE e.StopDateTime > @RptPeriodStop

  • Yet, not surprising when you consider that the parent table has 1.85M rows, the child table has 7.5M rows, and the validation table has 54K. The indexes were optimized to access the child table with the parent record known. The sub-query joins the child table with the validation table and then after that joins to the parent table. This forced a complete table scan join between the child and the validation tables without narrowing it down to the correct parent first. Combine that with the number of records needed from the parent table and you could understand why it took some time. It worked for sure, but the time involved was just too long for acceptable performance (at least 10 times longer).

  • 1) Did you try out my rewrite of the query?

    2)

    The sub-query joins the child table with the validation table and then after that joins to the parent table.

    It may be picking that plan, but that may be because the OR makes it tough to pick good indexes.

    With a derived table or CTE, the query optimizer does not restrict itself to calculating the derived table and then joining it to the others.

    So if it could find a good index to the parent table it would be using it. The derived table is just a way of communicating the correct logic for the query.

    An article with an example[/url]

    edit: of course, performance could still be tough because you can only filter so much of the parent table. You don't care what the dates on it are if you have the same dates within the child table.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply