Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Complex joins Expand / Collapse
Author
Message
Posted Tuesday, December 06, 2011 4:03 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 3:49 PM
Points: 5,794, Visits: 8,007
By the way, for anyone who wants to try how these variations cann affect the result set, here is the set of test data I used when prepaaring this question:

CREATE TABLE dbo.Employees
(EmployeeID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL,
Position varchar(20) NOT NULL);
CREATE TABLE dbo.Customers
(CustomerID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL,
SalesRep int NOT NULL REFERENCES Employees(EmployeeID));
CREATE TABLE dbo.Orders
(OrderID int NOT NULL PRIMARY KEY,
OrderDate date NOT NULL,
CustomerID int NOT NULL REFERENCES Customers(CustomerID),
Descrip varchar(40) NOT NULL);
go
INSERT INTO dbo.Employees (EmployeeID, Name, Position)
VALUES (1, 'New SalesRep', 'SalesRep'),
(2, 'Not in sales', 'Engineer'),
(3, 'Sales rep with active customers', 'SalesRep'),
(4, 'Sales rep with passive customers', 'SalesRep'),
(5, 'Former sales rep', 'Retired');
INSERT INTO dbo.Customers (CustomerID, Name, SalesRep)
VALUES (1, 'Active customer 1', 3),
(2, 'Active customer 2', 3),
(3, 'New customer 1', 3),
(4, 'New customer 2', 4),
(5, 'Old customer 1', 3),
(6, 'Old customer 2', 4),
(7, 'Customer of retired sales rep', 5);
INSERT INTO dbo.Orders (OrderID, OrderDate, CustomerID, Descrip)
VALUES (1, '2011-11-01', 1, 'Recent order'),
(2, '2011-11-02', 2, 'Recent order'),
(3, '2010-11-03', 2, 'Old order'),
(4, '2010-11-04', 5, 'Old order'),
(5, '2010-11-05', 6, 'Old order'),
(6, '2011-11-05', 7, 'Recent order');
go




Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #1216883
Posted Tuesday, December 06, 2011 4:44 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 5:59 AM
Points: 11,168, Visits: 10,931
Hi Hugo,

Writing a question to demonstrate the equivalence of SQL expressions is very brave, so well done for that. Small point: in the explanation you say, "the only way to rewrite the query is to reverse the table order", but this code expresses the same SQL semantic I think:

SELECT
e.Name,
c.Name,
o.OrderDate
FROM dbo.Employees AS e
LEFT JOIN
(
dbo.Orders AS o
JOIN dbo.Customers AS c ON
c.CustomerID = o.CustomerID
) ON
e.EmployeeID = o.CustomerID
AND o.OrderDate > DATEADD(MONTH, -4, CURRENT_TIMESTAMP)
WHERE
e.Position = 'SalesRep'

Personally I find this layout easier to read than 'nested' ON clauses.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1216910
Posted Tuesday, December 06, 2011 4:50 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: 2 days ago @ 11:47 PM
Points: 1,298, Visits: 1,306
Hugo Kornelis (12/6/2011)
honza.mf (12/6/2011)
Hi!
I don't understand how the condition
AND  o.OrderDate    >  DATEADD(month, -4, CURRENT_TIMESTAMP)

in SQL Query #4 works. For me it seems it is to eliminate the employees with no appropriate order attached.
Have I overlooked something?

To have 100% same behaviour, I use:
SELECT     e.Name AS SalesRep, c.Name AS Customer, o.OrderDate
FROM dbo.Customers AS c
INNER JOIN dbo.Orders AS o
ON c.CustomerID = o.CustomerID
AND o.OrderDate > DATEADD(month, -4, CURRENT_TIMESTAMP)
RIGHT JOIN dbo.Employees AS e
ON e.EmployeeID = c.SalesRep
WHERE e.Position = 'SalesRep';



Hi Honza.mf,

In query #4, the first step is the inner join between orders and customers. So for the right outer join, the left-hand side is a row set consisting of orders with their customers (where both orders without customer and customers without orders are already filtered out). On the right-hand side is the unfiltered table of employees.
The join will attempt to match each row on the left-hand side with each row on the right-hand side. It will retain combinations where the condition is true (the employee on the right-hand side is the sales rep for the order, and the order is no older than four months), and it will also retain rows from the right-hand side (employees) where the condition is not true for any of the rows of the left-hand side.

Your query filters out old orders before the outer joins even starts. I compared it to the #4 query, by logical comparison, running against my test data, and comparing execution plans. I did not find any differences, so it is also a correct rewrite of the original query - and one that is easier to comprehend! Kudos!

That being said, I have run into sitiuations where I needed to choose between the join pattern of the original query (with the nested joins), or that of query #4 (with the weird placement of the ON condition). I am 99% sure that a rewrite such as yours was not possible in that case. My goal was to reconstruct such a case for this question, and I obviously did not completely succeed at that. Fortunately, this does not invalidate the QotD (as the question is to identify the query that is exactly equivalent, not to produce the best rewrite).


Thank you very much for your answer. I "overlooked" some strange properties of right join I was never thinking about.
I just tried a query
SELECT *
FROM tab1
RIGHT JOIN tab2 ON 0=1

And it returns all (1 in my case) rows from tab2 with nulls for columns for tab1. And it is perfectly logic.
Probably I will use right joins and nested joins only in case of self-defense. I prefer the readability and I think from left to right.

With this little talking I must say I learned very much from your question.




See, understand, learn, try, use efficient
© Dr.Plch
Post #1216920
Posted Tuesday, December 06, 2011 5:21 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, July 09, 2013 11:12 PM
Points: 1,263, Visits: 1,081
SQL Kiwi (12/6/2011)
Hi Hugo,

Writing a question to demonstrate the equivalence of SQL expressions is very brave, so well done for that. Small point: in the explanation you say, "the only way to rewrite the query is to reverse the table order", but this code expresses the same SQL semantic I think:

SELECT
e.Name,
c.Name,
o.OrderDate
FROM dbo.Employees AS e
LEFT JOIN
(
dbo.Orders AS o
JOIN dbo.Customers AS c ON
c.CustomerID = o.CustomerID
) ON
e.EmployeeID = o.CustomerID
AND o.OrderDate > DATEADD(MONTH, -4, CURRENT_TIMESTAMP)
WHERE
e.Position = 'SalesRep'

Personally I find this layout easier to read than 'nested' ON clauses.


Very nice format to visually show the "nestedness" of the joins.

One minor flaw I'd attribute to speedily writing this query--the above one will only yield the same result as the original one if only employees are the customers and the IDs as customer and employee are equal.

Changing the bolded ON clause to
e.EmployeeID = c.SalesRep

will correct the issue and return the same result (and execution plan) as the original one (to verify that I used the sample data Hugo provided).

A great big thank you again to Hugo for the question and to Paul for the appealing reformatting of the query.
-Michael
Post #1216956
Posted Tuesday, December 06, 2011 5:41 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 3:49 PM
Points: 5,794, Visits: 8,007
SQL Kiwi (12/6/2011)
Small point: in the explanation you say, "the only way to rewrite the query is to reverse the table order", but this code expresses the same SQL semantic I think:

(...)

Personally I find this layout easier to read than 'nested' ON clauses.


Paul, I absolutely agree that the parentheses and indentation do make the query much easier to read and comprehend - but it doesn't change the fact that it's still a query that uses nested joins. The parentheses and indentation don't have any functional impact on the query; they merely make the evaluation order, as implied by the order of the JOIN and ON clauses, more explicit and more visible to mere humans. The few timies where I did choose to use nested joins in actual code, I have always used parentheses and indentation for easier reading. In fact, my first draft of this question did use exactly that. I just took it out before submitting the question, because I am mean.

You are obviously correct that there are more ways to rewrite the query. When I wrote the explanation, I meant to write "the only way to rewrite the query to a form that doesn't use nested joins" - but I somehow forgot to put in that last crucial part. I am sure that, if we all weigh in, we'll be able to find at least a hundred other ways to rewrite the query. ;)
I stiill stand by what I intended to write - that the only way to rewrite the query to a form that doesn't use nested joins involves reordering the table order and changing the left outer join to the far less cocmmon and harder to grasp right outer join form.

Thanks for your comments! ;)



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #1216967
Posted Tuesday, December 06, 2011 5:50 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 3:49 PM
Points: 5,794, Visits: 8,007
honza.mf (12/6/2011)
Probably I will use right joins and nested joins only in case of self-defense. I prefer the readability and I think from left to right.

Absolutely true. This question was intended to illustrate how a real-life problem sometimes forces you to choose the lesser of two evils. Both nested joins and right outer joins reduce readability and ease of understanding of a query - so pick your poison. I have been in this position a few times, and whatever I chose, I always felt it was not the optimal choice.

BTW, now that I no longer have to support SQL Server 2000, I can use a better solution using a Common Table Expression:
WITH CustomersWithRecentOrders AS
(SELECT c.Name, o.OrderDate, c.SalesRep
FROM dbo.Customers AS c
INNER JOIN dbo.Orders AS o
ON o.CustomerID = c.CustomerID
AND o.OrderDate > DATEADD(month, -4, CURRENT_TIMESTAMP))
SELECT e.Name AS SalesRep, co.Name AS Customer, co.OrderDate
FROM dbo.Employees AS e
LEFT JOIN CustomersWithRecentOrders AS co
ON co.SalesRep = e.EmployeeID
WHERE e.Position = 'SalesRep';

Note that this query does NOT produce the exact same execution plan as the original, but it DOES produce the exact same results. And of all the options, I think this one is the most readable.



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #1216972
Posted Tuesday, December 06, 2011 5:51 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Monday, April 14, 2014 10:58 AM
Points: 819, Visits: 832
SQL Kiwi (12/6/2011)
Hi Hugo,

Writing a question to demonstrate the equivalence of SQL expressions is very brave, so well done for that. Small point: in the explanation you say, "the only way to rewrite the query is to reverse the table order", but this code expresses the same SQL semantic I think:


Would this not work as well?

;WITH cte_Subquery AS
(SELECT c.SalesRep, c.Name, o.OrderDate
FROM dbo.Customers AS c
INNER JOIN dbo.Orders AS o
ON o.CustomerID = c.CustomerID
AND o.OrderDate > DATEADD(month, -4, CURRENT_TIMESTAMP))
SELECT e.Name AS SalesRep, s.Name AS Customer, s.OrderDate
FROM dbo.Employees AS e
LEFT JOIN cte_Subquery AS s
ON e.EmployeeID = s.SalesRep
WHERE e.Position = 'SalesRep';

I always go with CTEs in this situation when possible, but I would love to know if I'm off the mark on this one.

EDIT: Of course I posted this 45 seconds after Hugo beat me to it. At least I know I was thinking along the right lines.

ron


-----
a haiku...

NULL is not zero
NULL is not an empty string
NULL is the unknown
Post #1216974
Posted Tuesday, December 06, 2011 5:52 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Wednesday, October 24, 2012 8:17 PM
Points: 1,588, Visits: 247
Good hard question, made me think. I missed it but I did learn something from the excellent explanation. Thanks for submitting.

http://brittcluff.blogspot.com/
Post #1216975
Posted Tuesday, December 06, 2011 5:58 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 3:49 PM
Points: 5,794, Visits: 8,007
ronmoses (12/6/2011)
Would this not work as well?

Yup! (As you already found out after posting).
The only thing I don't like about your query is the semicolon before WITH. Semicolons belong at the end of each statement, not at the start.



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #1216980
Posted Tuesday, December 06, 2011 6:16 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 5:59 AM
Points: 11,168, Visits: 10,931
Hugo Kornelis (12/6/2011)
BTW, now that I no longer have to support SQL Server 2000, I can use a better solution using a Common Table Expression:

Even in SQL Server 2000, you can write the equivalent subquery/derived table as expressed by the CTE:

SELECT
SalesRep = e.Name,
Customer = co.Name,
co.OrderDate
FROM dbo.Employees AS e
LEFT JOIN
(
SELECT
c.Name,
o.OrderDate,
c.SalesRep
FROM dbo.Customers AS c
JOIN dbo.Orders AS o ON
o.CustomerID = c.CustomerID
AND o.OrderDate > DATEADD(MONTH, -4, CURRENT_TIMESTAMP)
) AS co ON
co.SalesRep = e.EmployeeID
WHERE
e.Position = 'SalesRep';





Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1217001
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse