Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Complex joins


Complex joins

Author
Message
Hugo Kornelis
Hugo Kornelis
SSCrazy Eights
SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)

Group: General Forum Members
Points: 8337 Visits: 11580
By the way, for anyone who wants to try how these variations cann affect the result set, here is the set of test data I used when prepaaring this question:

CREATE TABLE dbo.Employees
(EmployeeID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL,
Position varchar(20) NOT NULL);
CREATE TABLE dbo.Customers
(CustomerID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL,
SalesRep int NOT NULL REFERENCES Employees(EmployeeID));
CREATE TABLE dbo.Orders
(OrderID int NOT NULL PRIMARY KEY,
OrderDate date NOT NULL,
CustomerID int NOT NULL REFERENCES Customers(CustomerID),
Descrip varchar(40) NOT NULL);
go
INSERT INTO dbo.Employees (EmployeeID, Name, Position)
VALUES (1, 'New SalesRep', 'SalesRep'),
(2, 'Not in sales', 'Engineer'),
(3, 'Sales rep with active customers', 'SalesRep'),
(4, 'Sales rep with passive customers', 'SalesRep'),
(5, 'Former sales rep', 'Retired');
INSERT INTO dbo.Customers (CustomerID, Name, SalesRep)
VALUES (1, 'Active customer 1', 3),
(2, 'Active customer 2', 3),
(3, 'New customer 1', 3),
(4, 'New customer 2', 4),
(5, 'Old customer 1', 3),
(6, 'Old customer 2', 4),
(7, 'Customer of retired sales rep', 5);
INSERT INTO dbo.Orders (OrderID, OrderDate, CustomerID, Descrip)
VALUES (1, '2011-11-01', 1, 'Recent order'),
(2, '2011-11-02', 2, 'Recent order'),
(3, '2010-11-03', 2, 'Old order'),
(4, '2010-11-04', 5, 'Old order'),
(5, '2010-11-05', 6, 'Old order'),
(6, '2011-11-05', 7, 'Recent order');
go




Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10350 Visits: 11350
Hi Hugo,

Writing a question to demonstrate the equivalence of SQL expressions is very brave, so well done for that. Small point: in the explanation you say, "the only way to rewrite the query is to reverse the table order", but this code expresses the same SQL semantic I think:


SELECT
e.Name,
c.Name,
o.OrderDate
FROM dbo.Employees AS e
LEFT JOIN
(
dbo.Orders AS o
JOIN dbo.Customers AS c ON
c.CustomerID = o.CustomerID
) ON
e.EmployeeID = o.CustomerID
AND o.OrderDate > DATEADD(MONTH, -4, CURRENT_TIMESTAMP)
WHERE
e.Position = 'SalesRep'



Personally I find this layout easier to read than 'nested' ON clauses.



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
honza.mf
honza.mf
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1771 Visits: 1323
Hugo Kornelis (12/6/2011)
honza.mf (12/6/2011)
Hi!
I don't understand how the condition
AND  o.OrderDate    >  DATEADD(month, -4, CURRENT_TIMESTAMP)

in SQL Query #4 works. For me it seems it is to eliminate the employees with no appropriate order attached.
Have I overlooked something?

To have 100% same behaviour, I use:

SELECT e.Name AS SalesRep, c.Name AS Customer, o.OrderDate
FROM dbo.Customers AS c
INNER JOIN dbo.Orders AS o
ON c.CustomerID = o.CustomerID
AND o.OrderDate > DATEADD(month, -4, CURRENT_TIMESTAMP)
RIGHT JOIN dbo.Employees AS e
ON e.EmployeeID = c.SalesRep
WHERE e.Position = 'SalesRep';



Hi Honza.mf,

In query #4, the first step is the inner join between orders and customers. So for the right outer join, the left-hand side is a row set consisting of orders with their customers (where both orders without customer and customers without orders are already filtered out). On the right-hand side is the unfiltered table of employees.
The join will attempt to match each row on the left-hand side with each row on the right-hand side. It will retain combinations where the condition is true (the employee on the right-hand side is the sales rep for the order, and the order is no older than four months), and it will also retain rows from the right-hand side (employees) where the condition is not true for any of the rows of the left-hand side.

Your query filters out old orders before the outer joins even starts. I compared it to the #4 query, by logical comparison, running against my test data, and comparing execution plans. I did not find any differences, so it is also a correct rewrite of the original query - and one that is easier to comprehend! Kudos!

That being said, I have run into sitiuations where I needed to choose between the join pattern of the original query (with the nested joins), or that of query #4 (with the weird placement of the ON condition). I am 99% sure that a rewrite such as yours was not possible in that case. My goal was to reconstruct such a case for this question, and I obviously did not completely succeed at that. Fortunately, this does not invalidate the QotD (as the question is to identify the query that is exactly equivalent, not to produce the best rewrite).


Thank you very much for your answer. I "overlooked" some strange properties of right join I was never thinking about.
I just tried a query
SELECT *
FROM tab1
RIGHT JOIN tab2 ON 0=1


And it returns all (1 in my case) rows from tab2 with nulls for columns for tab1. And it is perfectly logic.
Probably I will use right joins and nested joins only in case of self-defense. I prefer the readability and I think from left to right.

With this little talking I must say I learned very much from your question.



See, understand, learn, try, use efficient
© Dr.Plch
michael.kaufmann
michael.kaufmann
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1293 Visits: 1082
SQL Kiwi (12/6/2011)
Hi Hugo,

Writing a question to demonstrate the equivalence of SQL expressions is very brave, so well done for that. Small point: in the explanation you say, "the only way to rewrite the query is to reverse the table order", but this code expresses the same SQL semantic I think:


SELECT
e.Name,
c.Name,
o.OrderDate
FROM dbo.Employees AS e
LEFT JOIN
(
dbo.Orders AS o
JOIN dbo.Customers AS c ON
c.CustomerID = o.CustomerID
) ON
e.EmployeeID = o.CustomerID
AND o.OrderDate > DATEADD(MONTH, -4, CURRENT_TIMESTAMP)
WHERE
e.Position = 'SalesRep'



Personally I find this layout easier to read than 'nested' ON clauses.


Very nice format to visually show the "nestedness" of the joins.

One minor flaw I'd attribute to speedily writing this query--the above one will only yield the same result as the original one if only employees are the customers and the IDs as customer and employee are equal.

Changing the bolded ON clause to
e.EmployeeID = c.SalesRep


will correct the issue and return the same result (and execution plan) as the original one (to verify that I used the sample data Hugo provided).

A great big thank you again to Hugo for the question and to Paul for the appealing reformatting of the query.
-Michael
Hugo Kornelis
Hugo Kornelis
SSCrazy Eights
SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)

Group: General Forum Members
Points: 8337 Visits: 11580
SQL Kiwi (12/6/2011)
Small point: in the explanation you say, "the only way to rewrite the query is to reverse the table order", but this code expresses the same SQL semantic I think:

(...)

Personally I find this layout easier to read than 'nested' ON clauses.


Paul, I absolutely agree that the parentheses and indentation do make the query much easier to read and comprehend - but it doesn't change the fact that it's still a query that uses nested joins. The parentheses and indentation don't have any functional impact on the query; they merely make the evaluation order, as implied by the order of the JOIN and ON clauses, more explicit and more visible to mere humans. The few timies where I did choose to use nested joins in actual code, I have always used parentheses and indentation for easier reading. In fact, my first draft of this question did use exactly that. I just took it out before submitting the question, because I am mean.Whistling

You are obviously correct that there are more ways to rewrite the query. When I wrote the explanation, I meant to write "the only way to rewrite the query to a form that doesn't use nested joins" - but I somehow forgot to put in that last crucial part. I am sure that, if we all weigh in, we'll be able to find at least a hundred other ways to rewrite the query. Wink
I stiill stand by what I intended to write - that the only way to rewrite the query to a form that doesn't use nested joins involves reordering the table order and changing the left outer join to the far less cocmmon and harder to grasp right outer join form.

Thanks for your comments! Wink


Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Hugo Kornelis
Hugo Kornelis
SSCrazy Eights
SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)

Group: General Forum Members
Points: 8337 Visits: 11580
honza.mf (12/6/2011)
Probably I will use right joins and nested joins only in case of self-defense. I prefer the readability and I think from left to right.

Absolutely true. This question was intended to illustrate how a real-life problem sometimes forces you to choose the lesser of two evils. Both nested joins and right outer joins reduce readability and ease of understanding of a query - so pick your poison. I have been in this position a few times, and whatever I chose, I always felt it was not the optimal choice.

BTW, now that I no longer have to support SQL Server 2000, I can use a better solution using a Common Table Expression:
WITH CustomersWithRecentOrders AS
(SELECT c.Name, o.OrderDate, c.SalesRep
FROM dbo.Customers AS c
INNER JOIN dbo.Orders AS o
ON o.CustomerID = c.CustomerID
AND o.OrderDate > DATEADD(month, -4, CURRENT_TIMESTAMP))
SELECT e.Name AS SalesRep, co.Name AS Customer, co.OrderDate
FROM dbo.Employees AS e
LEFT JOIN CustomersWithRecentOrders AS co
ON co.SalesRep = e.EmployeeID
WHERE e.Position = 'SalesRep';


Note that this query does NOT produce the exact same execution plan as the original, but it DOES produce the exact same results. And of all the options, I think this one is the most readable.


Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
ronmoses
ronmoses
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1095 Visits: 996
SQL Kiwi (12/6/2011)
Hi Hugo,

Writing a question to demonstrate the equivalence of SQL expressions is very brave, so well done for that. Small point: in the explanation you say, "the only way to rewrite the query is to reverse the table order", but this code expresses the same SQL semantic I think:


Would this not work as well?


;WITH cte_Subquery AS
(SELECT c.SalesRep, c.Name, o.OrderDate
FROM dbo.Customers AS c
INNER JOIN dbo.Orders AS o
ON o.CustomerID = c.CustomerID
AND o.OrderDate > DATEADD(month, -4, CURRENT_TIMESTAMP))
SELECT e.Name AS SalesRep, s.Name AS Customer, s.OrderDate
FROM dbo.Employees AS e
LEFT JOIN cte_Subquery AS s
ON e.EmployeeID = s.SalesRep
WHERE e.Position = 'SalesRep';



I always go with CTEs in this situation when possible, but I would love to know if I'm off the mark on this one.

EDIT: Of course I posted this 45 seconds after Hugo beat me to it. At least I know I was thinking along the right lines.

ron

-----
a haiku...

NULL is not zero
NULL is not an empty string
NULL is the unknown

Britt Cluff
Britt Cluff
SSCommitted
SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)

Group: General Forum Members
Points: 1693 Visits: 253
Good hard question, made me think. I missed it but I did learn something from the excellent explanation. Thanks for submitting.

http://brittcluff.blogspot.com/
Hugo Kornelis
Hugo Kornelis
SSCrazy Eights
SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)

Group: General Forum Members
Points: 8337 Visits: 11580
ronmoses (12/6/2011)
Would this not work as well?

Yup! (As you already found out after posting).
The only thing I don't like about your query is the semicolon before WITH. Semicolons belong at the end of each statement, not at the start.


Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10350 Visits: 11350
Hugo Kornelis (12/6/2011)
BTW, now that I no longer have to support SQL Server 2000, I can use a better solution using a Common Table Expression:

Even in SQL Server 2000, you can write the equivalent subquery/derived table as expressed by the CTE:


SELECT
SalesRep = e.Name,
Customer = co.Name,
co.OrderDate
FROM dbo.Employees AS e
LEFT JOIN
(
SELECT
c.Name,
o.OrderDate,
c.SalesRep
FROM dbo.Customers AS c
JOIN dbo.Orders AS o ON
o.CustomerID = c.CustomerID
AND o.OrderDate > DATEADD(MONTH, -4, CURRENT_TIMESTAMP)
) AS co ON
co.SalesRep = e.EmployeeID
WHERE
e.Position = 'SalesRep';





Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search