Complex joins

  • Comments posted to this topic are about the item Complex joins


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • A nice, complex question, but a darn good one.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • A very nice question. I learned something again.

  • Thanks for a great brain teaser--got my analytical half started this morning. Wonderful 🙂

  • I had never heard of Nested Joins, so learned something today.

    However my method of getting the right answer was to use the execution plans. One execution plan exactly matched the nested join. Once I had the right anser, I was able to work out why. 😀

  • Thanks, all, for your kind words. It;s always good to know that my questions are appreciated.

    Tom Brown (12/6/2011)


    I had never heard of Nested Joins, so learned something today.

    However my method of getting the right answer was to use the execution plans. One execution plan exactly matched the nested join. Once I had the right anser, I was able to work out why. 😀

    Ha! I like the creative thinking!

    I actually used a few rows of sample data (making sure to cover every possibility) to make sure that I had not accidentally messed up the question. But your method is so much easier! (Though you have to beware of checking not just the graphical representation of the plan, but also the "hidden" extra characteristics in the properties of the operators).

    My motto is that you don't have to know everything, as long as you knnow where.how to find out - and you show that there are often even multiple answers to THAT question!


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Enjoyed that one, would've been even better on a Monday morning to get the juices flowing! 😀

    Flabbergasted that so many people are getting it incorrect. . . I assumed that understanding the logic of join operators is integral to the majority of the visitors to this site.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • A good question - I tried to deduce it without running the code. Mmm. I knew there was a reason I never use that nested syntax. Perhaps becuase a) I didn't understand it (I do a little better now) and b) it adds nothing to the art of querying.

    However I do understand that caution is in order if I need to decode such now!

  • Hi!

    I don't understand how the condition AND o.OrderDate > DATEADD(month, -4, CURRENT_TIMESTAMP) in SQL Query #4 works. For me it seems it is to eliminate the employees with no appropriate order attached.

    Have I overlooked something?

    To have 100% same behaviour, I use:

    SELECT e.Name AS SalesRep, c.Name AS Customer, o.OrderDate

    FROM dbo.Customers AS c

    INNER JOIN dbo.Orders AS o

    ON c.CustomerID = o.CustomerID

    AND o.OrderDate > DATEADD(month, -4, CURRENT_TIMESTAMP)

    RIGHT JOIN dbo.Employees AS e

    ON e.EmployeeID = c.SalesRep

    WHERE e.Position = 'SalesRep';



    See, understand, learn, try, use efficient
    © Dr.Plch

  • honza.mf (12/6/2011)


    Hi!

    I don't understand how the condition AND o.OrderDate > DATEADD(month, -4, CURRENT_TIMESTAMP) in SQL Query #4 works. For me it seems it is to eliminate the employees with no appropriate order attached.

    Have I overlooked something?

    To have 100% same behaviour, I use:

    SELECT e.Name AS SalesRep, c.Name AS Customer, o.OrderDate

    FROM dbo.Customers AS c

    INNER JOIN dbo.Orders AS o

    ON c.CustomerID = o.CustomerID

    AND o.OrderDate > DATEADD(month, -4, CURRENT_TIMESTAMP)

    RIGHT JOIN dbo.Employees AS e

    ON e.EmployeeID = c.SalesRep

    WHERE e.Position = 'SalesRep';

    Hi Honza.mf,

    In query #4, the first step is the inner join between orders and customers. So for the right outer join, the left-hand side is a row set consisting of orders with their customers (where both orders without customer and customers without orders are already filtered out). On the right-hand side is the unfiltered table of employees.

    The join will attempt to match each row on the left-hand side with each row on the right-hand side. It will retain combinations where the condition is true (the employee on the right-hand side is the sales rep for the order, and the order is no older than four months), and it will also retain rows from the right-hand side (employees) where the condition is not true for any of the rows of the left-hand side.

    Your query filters out old orders before the outer joins even starts. I compared it to the #4 query, by logical comparison, running against my test data, and comparing execution plans. I did not find any differences, so it is also a correct rewrite of the original query - and one that is easier to comprehend! Kudos!

    That being said, I have run into sitiuations where I needed to choose between the join pattern of the original query (with the nested joins), or that of query #4 (with the weird placement of the ON condition). I am 99% sure that a rewrite such as yours was not possible in that case. My goal was to reconstruct such a case for this question, and I obviously did not completely succeed at that. Fortunately, this does not invalidate the QotD (as the question is to identify the query that is exactly equivalent, not to produce the best rewrite).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • By the way, for anyone who wants to try how these variations cann affect the result set, here is the set of test data I used when prepaaring this question:

    CREATE TABLE dbo.Employees

    (EmployeeID int NOT NULL PRIMARY KEY,

    Name varchar(50) NOT NULL,

    Position varchar(20) NOT NULL);

    CREATE TABLE dbo.Customers

    (CustomerID int NOT NULL PRIMARY KEY,

    Name varchar(50) NOT NULL,

    SalesRep int NOT NULL REFERENCES Employees(EmployeeID));

    CREATE TABLE dbo.Orders

    (OrderID int NOT NULL PRIMARY KEY,

    OrderDate date NOT NULL,

    CustomerID int NOT NULL REFERENCES Customers(CustomerID),

    Descrip varchar(40) NOT NULL);

    go

    INSERT INTO dbo.Employees (EmployeeID, Name, Position)

    VALUES (1, 'New SalesRep', 'SalesRep'),

    (2, 'Not in sales', 'Engineer'),

    (3, 'Sales rep with active customers', 'SalesRep'),

    (4, 'Sales rep with passive customers', 'SalesRep'),

    (5, 'Former sales rep', 'Retired');

    INSERT INTO dbo.Customers (CustomerID, Name, SalesRep)

    VALUES (1, 'Active customer 1', 3),

    (2, 'Active customer 2', 3),

    (3, 'New customer 1', 3),

    (4, 'New customer 2', 4),

    (5, 'Old customer 1', 3),

    (6, 'Old customer 2', 4),

    (7, 'Customer of retired sales rep', 5);

    INSERT INTO dbo.Orders (OrderID, OrderDate, CustomerID, Descrip)

    VALUES (1, '2011-11-01', 1, 'Recent order'),

    (2, '2011-11-02', 2, 'Recent order'),

    (3, '2010-11-03', 2, 'Old order'),

    (4, '2010-11-04', 5, 'Old order'),

    (5, '2010-11-05', 6, 'Old order'),

    (6, '2011-11-05', 7, 'Recent order');

    go


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hi Hugo,

    Writing a question to demonstrate the equivalence of SQL expressions is very brave, so well done for that. Small point: in the explanation you say, "the only way to rewrite the query is to reverse the table order", but this code expresses the same SQL semantic I think:

    SELECT

    e.Name,

    c.Name,

    o.OrderDate

    FROM dbo.Employees AS e

    LEFT JOIN

    (

    dbo.Orders AS o

    JOIN dbo.Customers AS c ON

    c.CustomerID = o.CustomerID

    ) ON

    e.EmployeeID = o.CustomerID

    AND o.OrderDate > DATEADD(MONTH, -4, CURRENT_TIMESTAMP)

    WHERE

    e.Position = 'SalesRep'

    Personally I find this layout easier to read than 'nested' ON clauses.

  • Hugo Kornelis (12/6/2011)


    honza.mf (12/6/2011)


    Hi!

    I don't understand how the condition AND o.OrderDate > DATEADD(month, -4, CURRENT_TIMESTAMP) in SQL Query #4 works. For me it seems it is to eliminate the employees with no appropriate order attached.

    Have I overlooked something?

    To have 100% same behaviour, I use:

    SELECT e.Name AS SalesRep, c.Name AS Customer, o.OrderDate

    FROM dbo.Customers AS c

    INNER JOIN dbo.Orders AS o

    ON c.CustomerID = o.CustomerID

    AND o.OrderDate > DATEADD(month, -4, CURRENT_TIMESTAMP)

    RIGHT JOIN dbo.Employees AS e

    ON e.EmployeeID = c.SalesRep

    WHERE e.Position = 'SalesRep';

    Hi Honza.mf,

    In query #4, the first step is the inner join between orders and customers. So for the right outer join, the left-hand side is a row set consisting of orders with their customers (where both orders without customer and customers without orders are already filtered out). On the right-hand side is the unfiltered table of employees.

    The join will attempt to match each row on the left-hand side with each row on the right-hand side. It will retain combinations where the condition is true (the employee on the right-hand side is the sales rep for the order, and the order is no older than four months), and it will also retain rows from the right-hand side (employees) where the condition is not true for any of the rows of the left-hand side.

    Your query filters out old orders before the outer joins even starts. I compared it to the #4 query, by logical comparison, running against my test data, and comparing execution plans. I did not find any differences, so it is also a correct rewrite of the original query - and one that is easier to comprehend! Kudos!

    That being said, I have run into sitiuations where I needed to choose between the join pattern of the original query (with the nested joins), or that of query #4 (with the weird placement of the ON condition). I am 99% sure that a rewrite such as yours was not possible in that case. My goal was to reconstruct such a case for this question, and I obviously did not completely succeed at that. Fortunately, this does not invalidate the QotD (as the question is to identify the query that is exactly equivalent, not to produce the best rewrite).

    Thank you very much for your answer. I "overlooked" some strange properties of right join I was never thinking about.

    I just tried a query

    SELECT *

    FROM tab1

    RIGHT JOIN tab2 ON 0=1

    And it returns all (1 in my case) rows from tab2 with nulls for columns for tab1. And it is perfectly logic.

    Probably I will use right joins and nested joins only in case of self-defense. I prefer the readability and I think from left to right.

    With this little talking I must say I learned very much from your question.



    See, understand, learn, try, use efficient
    © Dr.Plch

  • SQL Kiwi (12/6/2011)


    Hi Hugo,

    Writing a question to demonstrate the equivalence of SQL expressions is very brave, so well done for that. Small point: in the explanation you say, "the only way to rewrite the query is to reverse the table order", but this code expresses the same SQL semantic I think:

    SELECT

    e.Name,

    c.Name,

    o.OrderDate

    FROM dbo.Employees AS e

    LEFT JOIN

    (

    dbo.Orders AS o

    JOIN dbo.Customers AS c ON

    c.CustomerID = o.CustomerID

    ) ON

    e.EmployeeID = o.CustomerID

    AND o.OrderDate > DATEADD(MONTH, -4, CURRENT_TIMESTAMP)

    WHERE

    e.Position = 'SalesRep'

    Personally I find this layout easier to read than 'nested' ON clauses.

    Very nice format to visually show the "nestedness" of the joins.

    One minor flaw I'd attribute to speedily writing this query--the above one will only yield the same result as the original one if only employees are the customers and the IDs as customer and employee are equal.

    Changing the bolded ON clause to

    e.EmployeeID = c.SalesRep

    will correct the issue and return the same result (and execution plan) as the original one (to verify that I used the sample data Hugo provided).

    A great big thank you again to Hugo for the question and to Paul for the appealing reformatting of the query.

    -Michael

  • SQL Kiwi (12/6/2011)


    Small point: in the explanation you say, "the only way to rewrite the query is to reverse the table order", but this code expresses the same SQL semantic I think:

    (...)

    Personally I find this layout easier to read than 'nested' ON clauses.

    Paul, I absolutely agree that the parentheses and indentation do make the query much easier to read and comprehend - but it doesn't change the fact that it's still a query that uses nested joins. The parentheses and indentation don't have any functional impact on the query; they merely make the evaluation order, as implied by the order of the JOIN and ON clauses, more explicit and more visible to mere humans. The few timies where I did choose to use nested joins in actual code, I have always used parentheses and indentation for easier reading. In fact, my first draft of this question did use exactly that. I just took it out before submitting the question, because I am mean.:Whistling:

    You are obviously correct that there are more ways to rewrite the query. When I wrote the explanation, I meant to write "the only way to rewrite the query to a form that doesn't use nested joins" - but I somehow forgot to put in that last crucial part. I am sure that, if we all weigh in, we'll be able to find at least a hundred other ways to rewrite the query. 😉

    I stiill stand by what I intended to write - that the only way to rewrite the query to a form that doesn't use nested joins involves reordering the table order and changing the left outer join to the far less cocmmon and harder to grasp right outer join form.

    Thanks for your comments! 😉


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Viewing 15 posts - 1 through 15 (of 43 total)

You must be logged in to reply to this topic. Login to reply