Preference of LEFT JOIN <table> WHERE ... IS NULL vs. NOT EXISTS ( ... )

  • I could use some help understanding the pitfalls of using LEFT JOINs +WHERE <field> is NULL versus using NOT EXISTS sub-queries.

    An example: the below query was found here written by serviceaellis, and formatted a little SELECT CompetitionName ,

    CompetitionCategory ,

    EventType ,

    SkaterMbrNo ,

    FirstName ,

    LastName ,

    Email ,

    HomeClub ,

    PartnerFName ,

    PartnerLName ,

    PartnerEmail ,

    PartnerHomeClub ,

    SkaterID

    FROM dbo.v004CompetitionRegistrations AS CompCur

    WHERE SeasonEndDate >= DateFromParts( YEAR( GETDATE( )) , 6 , 30 )

    AND NOT EXISTS(

    SELECT CompetitionName ,

    CompetitionCategory ,

    EventType ,

    SkaterMbrNo ,

    FirstName ,

    LastName ,

    Email ,

    HomeClub AS [Home Club] ,

    PartnerFName ,

    PartnerLName ,

    PartnerEmail ,

    PartnerHomeClub ,

    SkaterID

    FROM dbo.v004CompetitionRegistrations AS CompHist

    WHERE SeasonEndDate <> DateFromParts( YEAR( GETDATE( )) , 6 , 30 )

    AND SkaterID = CompCur.SkaterID )

    GROUP BY CompetitionName ,

    CompetitionCategory ,

    EventType ,

    SkaterMbrNo ,

    FirstName ,

    LastName ,

    Email ,

    HomeClub ,

    PartnerFName ,

    PartnerLName ,

    PartnerEmail ,

    PartnerHomeClub ,

    SkaterID;

    I've seen documentation suggesting the following alternative to the NOT EXISTS ( ... ):

    SELECT CompCur.CompetitionName

    , CompCur.CompetitionCategory

    , CompCur.EventType

    , CompCur.SkaterMbrNo

    , CompCur.FirstName

    , CompCur.LastName

    , CompCur.Email

    , CompCur.HomeClub

    , CompCur.PartnerFName

    , CompCur.PartnerLName

    , CompCur.PartnerEmail

    , CompCur.PartnerHomeClub

    , CompCur.SkaterID

    FROM dbo.v004CompetitionRegistrations AS CompCur

    LEFT JOIN dbo.v004CompetitionRegistrations AS CompEarlier

    ON CompEarlier.SeasonEndDate <> cast(YEAR(GETDATE()) as CHAR(4)) + '0630'

    AND CompEarlier.SkaterID = CompCur.SkaterID

    WHERE CompEarlier.SkaterID IS NULL

    GROUP BY CompCur.CompetitionName

    , CompCur.CompetitionCategory

    , CompCur.EventType

    , CompCur.SkaterMbrNo

    , CompCur.FirstName

    , CompCur.LastName

    , CompCur.Email

    , CompCur.HomeClub

    , CompCur.PartnerFName

    , CompCur.PartnerLName

    , CompCur.PartnerEmail

    , CompCur.PartnerHomeClub

    , CompCur.SkaterID

    I've taken this as a personal preference as it avoids the sub-query (admittedly the query optimizer would probably optimize both queries' execution plans to exactly the same in this case).

    I don't see this LEFT JOIN ... WHERE ... IS NULL approach used much in practice. Is the reason that this obscures the logic - it's not as easy to scan, or is there some other reason it's more standard to use sub-queries in the WHERE clause instead?

  • This topic has come up many times in the past and usually the NOT EXISTS() version is preferred purely on the basis of its better performance.

    But that is not to say that it always performs better: you should test out both forms for yourself & examine both query plans to assure yourself which is better. But I would advise against opting for one or the other on the basis of style.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Ahh, you're right. Thanks Phil - I hadn't expected this to be such a common question - the first page of a Google search is full of answers :blush:.

  • (A list of SELECT columns is not needed in the NOT EXISTS - SELECT * or SELECT 1 will do. Originally I think SELECT * in an EXISTS statement was intended to imply that SQL could choose any column / index that was most appropriate, but I think nowadays the optimiser does whatever it needs to regardless ...)

    I think that the NOT EXISTS syntax is easier to read, but we usually use an OUTER JOIN.

    We have quite a lot of code that UpSerts and I find it easier to reuse the code from one statement to the other using JOIN rather than EXISTS. (i.e. I am looking for less-chance-of-bugs in recoding UPDATE for INSERT). Using MERGE is an alternative to this which may be better at reducing such recoding bugs, I haven't formed a view on that as yet - personally I find the MERGE syntax unweildy, but I expect that is just me)

    UPDATE D

    SET D.Col1 = S.Col1,

    D.Col2 = S.Col2,

    ...

    FROM MySourceTable AS S

    [highlight="#ffff11"]INNER[/highlight] JOIN MyDestinationTable AS D

    ON D.ID = S.ID

    [highlight="#ffff11"]--[/highlight] WHERE D.ID IS NULL

    INSERT INTO MyDestinationTable

    SELECT S.Col1, S.Col2, ...

    FROM MySourceTable AS S

    [highlight="#ffff11"]LEFT OUTER[/highlight] JOIN MyDestinationTable AS D

    ON D.ID = S.ID

    [highlight="#ffff11"]WHERE D.ID IS NULL[/highlight]

    We tend not to have deleted in our UpSert, but if you do then you have to swap tables over etc. so it has less code-reuse benefit that Insert/Update

    DELETE D

    FROM [highlight="#ffff11"]MyDestinationTable AS D[/highlight]

    LEFT OUTER JOIN [highlight="#ffff11"]MySourceTable AS S[/highlight]

    ON D.ID = S.ID

    WHERE [highlight="#ffff11"]S[/highlight].ID IS NULL

  • http://sqlinthewild.co.za/index.php/2010/04/27/in-exists-and-join-a-roundup/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply