Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Simple LEFT JOIN gives surprising result - Why? Expand / Collapse
Author
Message
Posted Wednesday, January 30, 2008 8:48 AM
Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Wednesday, July 16, 2014 8:16 AM
Points: 716, Visits: 459
Programming in SQL is a humbling experience. I thought I knew just what this would return:

create table a (a1 int primary key, a2 int)
create table b (b1 int primary key, a1 int)

insert a values (1, 10)
insert a values (2, 20)
insert b values (2, 2)

select *
from a
left join b
on a.a1 = b.a1
and a.a1 = 1


Here's the results:
a1 a2 b1 a1
1 10 NULL NULL
2 20 NULL NULL

Can someone explain to me why the second row is there? Obviously, moving the line a.a1 = 1 into a WHERE clause will eliminate the row, but I'd like a technical explanation as to why SQL null-extends the second row and includes it in the result set. I'd understand this action if the ON clause referenced a b column and a constant, but I thought that the inner table to constant conditions in an ON clause behaved identically to that condition in a WHERE clause.

My understanding of LEFT JOIN was: return NULLS for all columns of null-extended table (b, in this case) where criteria involving that table is not met. However, here is what BOL says on outer joins:


Inner joins eliminate the rows that do not match with a row from the other table. Outer joins, however, return all rows from at least one of the tables or views mentioned in the FROM clause, as long as those rows meet any WHERE or HAVING search conditions. All rows are retrieved from the left table referenced with a left outer join, and all rows from the right table referenced in a right outer join. All rows from both tables are returned in a full outer join


Wow, that's just not the way I've viewed outer joins for the last 10 years of daily programming. Does this surprise anyone else?

Vince



Post #449462
Posted Wednesday, January 30, 2008 9:27 AM
Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Wednesday, July 16, 2014 8:16 AM
Points: 716, Visits: 459
A follow-up observation. Based on the explanation in BOL for outer joins, any criteria of inner join table column to a constant in an ON clause of an outer join is meaningless.

That being the case, why is there no warning to that effect generated by SQL, similar to the warning generated when an outer-joined table is later inner-joined in the query?

I would go a step further and state that such syntax (in an outer join ON clause, inner-join table column to constant) should generate a compile-time error. If the purpose of the JOIN clause is simply to put two tables together, yet not limit the number of rows put together in the process, then specifying such a statement intended to limit rather than join should be illegal.

But here's where I really get on my soapbox. Let's slightly alter the syntax of the join, replacing the LEFT JOIN with a JOIN (and using 2 as the constant so we get something back):


create table a (a1 int primary key, a2 int)
create table b (b1 int primary key, a1 int)

insert a values (1, 10)
insert a values (2, 20)
insert b values (2, 2)

select *
from a
join b
on a.a1 = b.a1
and a.a1 = 2


Results:
a1 a2 b1 a1
2 20 2 2

In this case, the ON criteria a.a1 = 2 limited results rather than linked rows from two tables, and it works as expected.

Consider the ramifications. A statement exists in a query where, in an INNER JOIN, one of the first table's columns is joined to a constant (or T-SQL @variable) as limiting criteria in the ON clause. Testing reveals situations where rows do not exist in the second table, failing the join, so "INNER JOIN" is replaced by "LEFT JOIN" . Poof! The limiting criteria in the ON clause just became meaningless!

My colleague and I are in shock, wondering how many of our procedures coded over the years have this subtle issue in outer joins.

Are we just dense?

Vince



Post #449498
Posted Wednesday, January 30, 2008 10:40 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 5:33 AM
Points: 845, Visits: 5,444
I do not see any problem.

What logically happens for a left join is:
1. An inner join is done.
2. Any rows in the left table not in the result are added.
3. The Where clause is evaluated.
etc

SQL has many subtleties - you should always test that you are getting the results you desire.
Post #449548
Posted Wednesday, January 30, 2008 10:45 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Today @ 7:32 PM
Points: 15,517, Visits: 27,895
I agree.

Logically you're defining the join criteria in the ON clause. Based on that criteria, looking at the code, your left join worked perfectly. If you wish to filter the result set, that's what the WHERE clause is for. This makes sense.


----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #449549
Posted Wednesday, January 30, 2008 11:22 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 11:55 AM
Points: 7,115, Visits: 14,983
In my experience - just using the word "join" usually tells SQL server "inner join", whereas "left join" = LEFT OUTER JOIN.

And while the results might look a little strange at first, it goes to highlight the difference between JOINING criteria, and limiting criteria. In an outer join scenario, the joining criteria aren't going to be limiting on the "side" it's defined on.

Ultimately - something not actually defining the relationship between the two tables should never be in the ON clause.

I do feel your pain if you have a lot of this code out there - could rear its ugly little head just about anywhere.


----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #449571
Posted Wednesday, January 30, 2008 12:45 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 4:09 PM
Points: 3,121, Visits: 11,395
It appears that the query is working exactly as described in SQL Server Books Online.

Post #449626
Posted Wednesday, January 30, 2008 1:26 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Monday, June 30, 2014 7:34 AM
Points: 50, Visits: 534
Yes, subtle and the reason that I (almost) never use a constant in the ON. By adding one more row to the original query it becomes clearer:

create table a (a1 int primary key, a2 int)
create table b (b1 int primary key, a1 int)

insert a values (1, 10)
insert a values (2, 20)
insert b values (2, 2)
insert b values (20, 1) -- The new row

select *
from a
left join b
on a.a1 = b.a1
and a.a1 = 1

drop table a
drop table b

a1 a2 b1 a1
--- --- --- ---
1 10 20 1
2 20 NULL NULL

This makes it a little easier to see that the ON is controlling which b rows join a, but not filtering the result set. (So, this is positive reinforcement to not do this.)
Post #449658
Posted Thursday, January 31, 2008 9:31 AM
Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Wednesday, July 16, 2014 8:16 AM
Points: 716, Visits: 459
Thanks for the comments. Yes, it works as described in BOL, just not as I would have thought.

How about some comments on the warning suggestion? Since the criteria of a column to a constant works as one would expect in an inner join, yet is worthless in an outer join, shouldn't there be a warning to that effect emitted by SQL?

Bottom line, as suggested, constants go only in the where clause unless required to make the outer join make sense.



Post #450032
Posted Thursday, January 31, 2008 1:03 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Monday, June 30, 2014 7:34 AM
Points: 50, Visits: 534
You know that I have high regard for your T-SQL understanding.

But, sorry, but I don't think this needs a warning any more than many other sneaky behaviors of SQL Server. At least this one is syntactically defined and is not an optimizer decision.

The two ways of expressing this query give two differently purposed result sets. Although I agree that it can be surprising, I do not agree that the constant in the ON clause is meaningless.

-- Give me all 'a' rows.
-- If a.a1 = 1 = b.a1
-- also join a 'b' row to my 'a' row
select *
from a
left join b
on a.a1 = b.a1
and a.a1 = 1

a1 a2 b1 a1
--- --- --- ---
1 10 20 1
2 20 NULL NULL

-- Give me all 'a' rows where 'a.a1' = 1.
-- If a.a1 = b.a1 (and thus = 1)
-- also join a 'b' row to my 'a' row
select *
from a
left join b
on a.a1 = b.a1
WHERE b.a1 = 1

a1 a2 b1 a1
--- --- --- ---
1 10 20 1

Then, just for fun use a FULL OUTER JOIN on the first query.

-- Give me all 'a' rows and all 'b' rows
-- If a.a1 = 1 = b.a1 join a 'b' row to my 'a' row
-- Else show the 'a' and 'b' rows independently.
select *
from a
left join b
on a.a1 = b.a1
and a.a1 = 1

a1 a2 b1 a1
--- --- --- ---
1 10 20 1
2 20 NULL NULL
NULL NULL 2 2

But it has been a long time since I thought about this, so thanks for bringing up the problem. It definitely should be in the basic course materials for any T-SQL querying classes and emphasized with a few good examples.
Post #450145
Posted Thursday, January 31, 2008 3:05 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, July 11, 2014 12:43 PM
Points: 169, Visits: 814
Sometimes putting an additional filter, even on a constant, in your join clause is the only way to go. Consider the following:
Your boss asks for a data extract of all sales people who have sales in a certain month.
So you whip up this quick query (all using AdventureWorks)

Select SP.SalesPersonID, Sum(SubTotal)
From
Sales.SalesPerson SP
INNER JOIN Sales.SalesOrderHeader SOH ON SP.SalesPersonID = SOH.SalesPersonID
where SOH.OrderDate Between '1/1/2003' and '1/31/2003' OR SOH.OrderDate is null
Group By SP.SalesPersonID

Your boss comes back to you complaining that she doesn't see all the sales people. You think to yourself "I learned this one in SQL class. I'll just LEFT JOIN, and my problems go away."
So you come back with
Select SP.SalesPersonID, Sum(SubTotal)
From
Sales.SalesPerson SP
LEFT JOIN Sales.SalesOrderHeader SOH ON SP.SalesPersonID = SOH.SalesPersonID
where SOH.OrderDate Between '1/1/2003' and '1/31/2003'
Group By SP.SalesPersonID

What the heck? Still only 12 Sales people. I should be getting 17.
Now you smack your hand on your forehead, and think that your where clause is filtering out NULLs, so you try
Select SP.SalesPersonID, Sum(SubTotal)
From
Sales.SalesPerson SP
LEFT JOIN Sales.SalesOrderHeader SOH ON SP.SalesPersonID = SOH.SalesPersonID
where SOH.OrderDate Between '1/1/2003' and '1/31/2003' OR OrderDate IS NULL
Group By SP.SalesPersonID

Still no luck. Your filter condition on the table being joined to is causing the sales people in the source table to be filtered out altogether, so you finally come up with
Select SP.SalesPersonID, Sum(SubTotal)
From
Sales.SalesPerson SP
LEFT JOIN Sales.SalesOrderHeader SOH ON SP.SalesPersonID = SOH.SalesPersonID
and SOH.OrderDate Between '1/1/2003' and '1/31/2003'
Group By SP.SalesPersonID

Ahhh...That's the ticket!
This can work anytime you are trying to filter on a table being joined to.
If your boss had asked only for Open Orders, or anything that required some sort of filter, but you still need to get all the records so you can at least report a 0 for those sales people that had no records, this technique can be quite useful.
Hope this helps.
Post #450209
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse