Preference of LEFT JOIN <table> WHERE ... IS NULL vs. NOT EXISTS ( ... )

Question

Preference of LEFT JOIN <table> WHERE ... IS NULL vs. NOT EXISTS ( ... )

Andrew P

SSCarpal Tunnel

Points: 4649
More actions
September 24, 2015 at 8:52 pm

#303445

I could use some help understanding the pitfalls of using LEFT JOINs +WHERE <field> is NULL versus using NOT EXISTS sub-queries.
An example: the below query was found here written by serviceaellis, and formatted a little SELECT CompetitionName ,
CompetitionCategory ,
EventType ,
SkaterMbrNo ,
FirstName ,
LastName ,
Email ,
HomeClub ,
PartnerFName ,
PartnerLName ,
PartnerEmail ,
PartnerHomeClub ,
SkaterID
FROM dbo.v004CompetitionRegistrations AS CompCur
WHERE SeasonEndDate >= DateFromParts( YEAR( GETDATE( )) , 6 , 30 )
AND NOT EXISTS(
SELECT CompetitionName ,
CompetitionCategory ,
EventType ,
SkaterMbrNo ,
FirstName ,
LastName ,
Email ,
HomeClub AS [Home Club] ,
PartnerFName ,
PartnerLName ,
PartnerEmail ,
PartnerHomeClub ,
SkaterID
FROM dbo.v004CompetitionRegistrations AS CompHist
WHERE SeasonEndDate <> DateFromParts( YEAR( GETDATE( )) , 6 , 30 )
AND SkaterID = CompCur.SkaterID )
GROUP BY CompetitionName ,
CompetitionCategory ,
EventType ,
SkaterMbrNo ,
FirstName ,
LastName ,
Email ,
HomeClub ,
PartnerFName ,
PartnerLName ,
PartnerEmail ,
PartnerHomeClub ,
SkaterID;
I've seen documentation suggesting the following alternative to the NOT EXISTS ( ... ):
SELECT CompCur.CompetitionName
, CompCur.CompetitionCategory
, CompCur.EventType
, CompCur.SkaterMbrNo
, CompCur.FirstName
, CompCur.LastName
, CompCur.Email
, CompCur.HomeClub
, CompCur.PartnerFName
, CompCur.PartnerLName
, CompCur.PartnerEmail
, CompCur.PartnerHomeClub
, CompCur.SkaterID
FROM dbo.v004CompetitionRegistrations AS CompCur
LEFT JOIN dbo.v004CompetitionRegistrations AS CompEarlier
ON CompEarlier.SeasonEndDate <> cast(YEAR(GETDATE()) as CHAR(4)) + '0630'
AND CompEarlier.SkaterID = CompCur.SkaterID
WHERE CompEarlier.SkaterID IS NULL
GROUP BY CompCur.CompetitionName
, CompCur.CompetitionCategory
, CompCur.EventType
, CompCur.SkaterMbrNo
, CompCur.FirstName
, CompCur.LastName
, CompCur.Email
, CompCur.HomeClub
, CompCur.PartnerFName
, CompCur.PartnerLName
, CompCur.PartnerEmail
, CompCur.PartnerHomeClub
, CompCur.SkaterID
I've taken this as a personal preference as it avoids the sub-query (admittedly the query optimizer would probably optimize both queries' execution plans to exactly the same in this case).
I don't see this LEFT JOIN ... WHERE ... IS NULL approach used much in practice. Is the reason that this obscures the logic - it's not as easy to scan, or is there some other reason it's more standard to use sub-queries in the WHERE clause instead?

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 247228 More actions · Answer 1

This topic has come up many times in the past and usually the NOT EXISTS() version is preferred purely on the basis of its better performance.

But that is not to say that it always performs better: you should test out both forms for yourself & examine both query plans to assure yourself which is better. But I would advise against opting for one or the other on the basis of style.

Andrew P SSCarpal Tunnel Points: 4649 More actions · Answer 2

Ahh, you're right. Thanks Phil - I hadn't expected this to be such a common question - the first page of a Google search is full of answers :blush:.

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 3

(A list of SELECT columns is not needed in the NOT EXISTS - SELECT * or SELECT 1 will do. Originally I think SELECT * in an EXISTS statement was intended to imply that SQL could choose any column / index that was most appropriate, but I think nowadays the optimiser does whatever it needs to regardless ...)

I think that the NOT EXISTS syntax is easier to read, but we usually use an OUTER JOIN.

We have quite a lot of code that UpSerts and I find it easier to reuse the code from one statement to the other using JOIN rather than EXISTS. (i.e. I am looking for less-chance-of-bugs in recoding UPDATE for INSERT). Using MERGE is an alternative to this which may be better at reducing such recoding bugs, I haven't formed a view on that as yet - personally I find the MERGE syntax unweildy, but I expect that is just me)

UPDATE D

SET D.Col1 = S.Col1,

D.Col2 = S.Col2,

...

FROM MySourceTable AS S

[highlight="#ffff11"]INNER[/highlight] JOIN MyDestinationTable AS D

ON D.ID = S.ID

[highlight="#ffff11"]--[/highlight] WHERE D.ID IS NULL

INSERT INTO MyDestinationTable

SELECT S.Col1, S.Col2, ...

FROM MySourceTable AS S

[highlight="#ffff11"]LEFT OUTER[/highlight] JOIN MyDestinationTable AS D

ON D.ID = S.ID

[highlight="#ffff11"]WHERE D.ID IS NULL[/highlight]

We tend not to have deleted in our UpSert, but if you do then you have to swap tables over etc. so it has less code-reuse benefit that Insert/Update

DELETE D

FROM [highlight="#ffff11"]MyDestinationTable AS D[/highlight]

LEFT OUTER JOIN [highlight="#ffff11"]MySourceTable AS S[/highlight]

ON D.ID = S.ID

WHERE [highlight="#ffff11"]S[/highlight].ID IS NULL

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 4

http://sqlinthewild.co.za/index.php/2010/04/27/in-exists-and-join-a-roundup/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass