Except vs. Left Join vs. Not Exists

  • I'm hoping someone can confirm my findings here, because I understood EXCEPT to be a higher optimized method of finding Anti-semi Joins.

    My benchmark code is below. What I'm finding relatively consistently is that a not exists seems to be performing the best on my testbed here, and I'm hoping someone can poke some holes in why. The only time I can get EXCEPT to outperform the other methods is when secondary data is not-included. This makes sense, as EXCEPT is going to compare each field in the listing. But even using sub-query selects on only the key fields still runs slower. Though again, this makes sense in context, as we're now doing a 3 way join in essence.

    Consistently, the NOT EXISTS seems to run the fatest. LEFT JOIN - IS NULL second fastest, though The 'EASY VERSION EXCEPT', a non-subqueried version, will run faster when no fields secondary to the join mechanic are involved. The Subqueried EXCEPT generally runs faster then standard two field EXCEPT, even with the extra work. The easy except seems to suicide when I drop indexes on the #tmps, for some reason. I haven't looked into execution plans on that one yet to figure out why, but just be aware that it's not just you if you see that too. 🙂

    So, opinions, conversations, options, and general abuse of my methodology? 😉

    SET NOCOUNT ON

    /*

    USE Model

    GO

    SELECT TOP 1000000 --equates to more than 30 years of dates

    IDENTITY(INT,1,1) AS N INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    --===== Let the public use it

    GRANT SELECT, REFERENCES ON dbo.Tally TO PUBLIC

    GO

    USE tempdb

    GO

    -- REPEAT above select.

    */

    IF OBJECT_ID( 'tempdb..#blah') IS NOT NULL

    DROP TABLE #blah

    IF OBJECT_ID( 'tempdb..#tmp') IS NOT NULL

    DROP TABLE #tmp

    IF OBJECT_ID( 'tempdb..#tmp2') IS NOT NULL

    DROP TABLE #tmp2

    IF OBJECT_ID( 'tempdb..#tmp3') IS NOT NULL

    DROP TABLE #tmp3

    IF OBJECT_ID( 'tempdb..#tmp4') IS NOT NULL

    DROP TABLE #tmp4

    CREATE TABLE #blah

    (GroupVal INT NOT NULL,

    FabricatedIDINT IDENTITY( 1,1) NOT NULL,

    SomeValueVARCHAR(10))

    ALTER TABLE #blah ADD PRIMARY KEY CLUSTERED (FabricatedID)

    CREATE TABLE #tmp

    (FabricatedID INT, SomeValue VARCHAR(10))

    CREATE TABLE #tmp2

    (FabricatedID INT, SomeValue VARCHAR(10))

    CREATE TABLE #tmp3

    (FabricatedID INT, SomeValue VARCHAR(10))

    CREATE TABLE #tmp4

    (FabricatedID INT, SomeValue VARCHAR(10))

    INSERT INTO #blah

    (GroupVal, SomeValue)

    SELECT TOP 1000000 -- 1 mill test

    t1.n%15 AS GroupVal, -- 15 groups

    REPLICATE( CHAR(t1.n), 10 ) AS SomeValue

    FROM

    tempdb..Tally AS t1 -- Whatever you use for tally here.

    INSERT INTO #tmp

    SELECT FabricatedID, SomeValue

    FROM#blah

    WHEREGroupVal <> 14

    INSERT INTO #tmp2

    SELECT FabricatedID, SomeValue

    FROM#blah

    WHEREGroupVal <> 14

    INSERT INTO #tmp3

    SELECT FabricatedID, SomeValue

    FROM#blah

    WHEREGroupVal <> 14

    INSERT INTO #tmp4

    SELECT FabricatedID, SomeValue

    FROM#blah

    WHEREGroupVal <> 14

    -- Real testing starts here.

    -- Paralleled ran slower across the board on my desktop server. Left here in case

    -- someone wants to test.

    /*

    SET STATISTICS IO, TIME ON

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'Old-skool LEFT JOIN - IS NULL (Parallelled)'

    INSERT INTO #tmp2

    SELECT

    b.FabricatedID, b.SomeValue

    FROM

    #blah AS b

    LEFT JOIN

    #tmp2 AS t

    ONb.FabricatedID = t.FabricatedID

    WHERE

    t.FabricatedID IS NULL

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'NOT EXISTS (Parallelled)'

    INSERT INTO #tmp3

    SELECT

    b.FabricatedID, b.SomeValue

    FROM

    #blah AS b

    WHERE

    NOT EXISTS ( SELECT 1 FROM #tmp3 AS t WHERE b.FabricatedID = t.FabricatedID)

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'EXCEPT (Parallelled)'

    INSERT INTO #tmp

    SELECT

    FabricatedID, SomeValue

    FROM

    #blah

    EXCEPT

    SELECT

    FabricatedID, SomeValue

    FROM

    #tmp

    SET STATISTICS IO, TIME OFF

    */

    -- Real testing starts here.

    SET STATISTICS IO, TIME ON

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'Old-skool LEFT JOIN - IS NULL (MAXDOP 1)'

    INSERT INTO #tmp2

    SELECT

    b.FabricatedID, b.SomeValue

    FROM

    #blah AS b

    LEFT JOIN

    #tmp2 AS t

    ONb.FabricatedID = t.FabricatedID

    WHERE

    t.FabricatedID IS NULL

    OPTION (MAXDOP 1)

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'NOT EXISTS (MAXDOP 1)'

    INSERT INTO #tmp3

    SELECT

    b.FabricatedID, b.SomeValue

    FROM

    #blah AS b

    WHERE

    NOT EXISTS ( SELECT 1 FROM #tmp3 AS t WHERE b.FabricatedID = t.FabricatedID)

    OPTION (MAXDOP 1)

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'EASY VERSION EXCEPT (MAXDOP 1)'

    INSERT INTO #tmp

    SELECT

    FabricatedID, SomeValue

    FROM

    #blah

    EXCEPT

    SELECT

    FabricatedID, SomeValue

    FROM

    #tmp

    OPTION (MAXDOP 1)

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'SUBQUERY EXCEPT (MAXDOP 1)'

    INSERT INTO #tmp4

    SELECT

    b.FabricatedID, b.SomeValue

    FROM

    #blah AS b

    JOIN

    (SELECT

    FabricatedID

    FROM

    #blah

    EXCEPT

    SELECT

    FabricatedID

    FROM

    #tmp4

    ) AS drv

    ONb.FabricatedID = drv.FabricatedID

    OPTION (MAXDOP 1)

    -- Force Order doesn't make a difference here, really.

    -- The execution plan seems strange to me, but that's another story.

    SET STATISTICS IO, TIME OFF

    -- Time to see if indexing on the #tmps makes a difference here.

    DELETE t FROM #tmp AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14

    DELETE t FROM #tmp2 AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14

    DELETE t FROM #tmp3 AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14

    DELETE t FROM #tmp4 AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14

    CREATE INDEX idx_c ON #tmp (FabricatedID)

    CREATE INDEX idx_c ON #tmp2 (FabricatedID)

    CREATE INDEX idx_c ON #tmp3 (FabricatedID)

    CREATE INDEX idx_c ON #tmp4 (FabricatedID)

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'AFTER INDEX GENERATION ON #tmps'

    SET STATISTICS IO, TIME ON

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'Old-skool LEFT JOIN - IS NULL (MAXDOP 1)'

    INSERT INTO #tmp2

    SELECT

    b.FabricatedID, b.SomeValue

    FROM

    #blah AS b

    LEFT JOIN

    #tmp2 AS t

    ONb.FabricatedID = t.FabricatedID

    WHERE

    t.FabricatedID IS NULL

    OPTION (MAXDOP 1)

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'NOT EXISTS (MAXDOP 1)'

    INSERT INTO #tmp3

    SELECT

    b.FabricatedID, b.SomeValue

    FROM

    #blah AS b

    WHERE

    NOT EXISTS ( SELECT 1 FROM #tmp3 AS t WHERE b.FabricatedID = t.FabricatedID)

    OPTION (MAXDOP 1)

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'EASY VERSION EXCEPT (MAXDOP 1)'

    INSERT INTO #tmp

    SELECT

    FabricatedID, SomeValue

    FROM

    #blah

    EXCEPT

    SELECT

    FabricatedID, SomeValue

    FROM

    #tmp

    OPTION (MAXDOP 1)

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'SUBQUERY EXCEPT (MAXDOP 1)'

    INSERT INTO #tmp4

    SELECT

    b.FabricatedID, b.SomeValue

    FROM

    #blah AS b

    JOIN

    (SELECT

    FabricatedID

    FROM

    #blah

    EXCEPT

    SELECT

    FabricatedID

    FROM

    #tmp4

    ) AS drv

    ONb.FabricatedID = drv.FabricatedID

    OPTION (MAXDOP 1)

    -- Execution plan after the index is in play seems to behave better, though the non-subquery

    -- version now outperforms it.

    SET STATISTICS IO, TIME OFF

    -- Alright, one last test cycle... EXCEPT without SomeValue affecting the problem.

    ALTER TABLE #tmp DROP COLUMN SomeValue

    ALTER TABLE #tmp2 DROP COLUMN SomeValue

    ALTER TABLE #tmp3 DROP COLUMN SomeValue

    ALTER TABLE #tmp4 DROP COLUMN SomeValue

    DELETE t FROM #tmp AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14

    DELETE t FROM #tmp2 AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14

    DELETE t FROM #tmp3 AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14

    DELETE t FROM #tmp4 AS t JOIN #blah AS b ON t.FabricatedID = b.FabricatedID WHERE b.GroupVal = 14

    DROP INDEX idx_c ON #tmp

    DROP INDEX idx_c ON #tmp2

    DROP INDEX idx_c ON #tmp3

    DROP INDEX idx_c ON #tmp4

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'INDEXES DROPPED on #tmps, SomeValue column removed'

    SET STATISTICS IO, TIME ON

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'Old-skool LEFT JOIN - IS NULL (MAXDOP 1)'

    INSERT INTO #tmp2

    SELECT

    b.FabricatedID

    FROM

    #blah AS b

    LEFT JOIN

    #tmp2 AS t

    ONb.FabricatedID = t.FabricatedID

    WHERE

    t.FabricatedID IS NULL

    OPTION (MAXDOP 1)

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'NOT EXISTS (MAXDOP 1)'

    INSERT INTO #tmp3

    SELECT

    b.FabricatedID

    FROM

    #blah AS b

    WHERE

    NOT EXISTS ( SELECT 1 FROM #tmp3 AS t WHERE b.FabricatedID = t.FabricatedID)

    OPTION (MAXDOP 1)

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'EASY VERSION EXCEPT (MAXDOP 1)'

    INSERT INTO #tmp

    SELECT

    FabricatedID

    FROM

    #blah

    EXCEPT

    SELECT

    FabricatedID

    FROM

    #tmp

    OPTION (MAXDOP 1)

    PRINT CHAR(10) + REPLICATE( '-', 30) + CHAR(10) + 'SUBQUERY EXCEPT (MAXDOP 1)'

    INSERT INTO #tmp4

    SELECT

    b.FabricatedID

    FROM

    #blah AS b

    JOIN

    (SELECT

    FabricatedID

    FROM

    #blah

    EXCEPT

    SELECT

    FabricatedID

    FROM

    #tmp4

    ) AS drv

    ONb.FabricatedID = drv.FabricatedID

    OPTION (MAXDOP 1)

    -- Execution plan after the index is in play seems to behave better, though the non-subquery

    -- version now outperforms it.

    SET STATISTICS IO, TIME OFF

    -- Need this otherwise it whines on reruns about differing table schemas.

    IF OBJECT_ID( 'tempdb..#blah') IS NOT NULL

    DROP TABLE #blah

    IF OBJECT_ID( 'tempdb..#tmp') IS NOT NULL

    DROP TABLE #tmp

    IF OBJECT_ID( 'tempdb..#tmp2') IS NOT NULL

    DROP TABLE #tmp2

    IF OBJECT_ID( 'tempdb..#tmp3') IS NOT NULL

    DROP TABLE #tmp3

    IF OBJECT_ID( 'tempdb..#tmp4') IS NOT NULL

    DROP TABLE #tmp4

    A sample of my results:

    ------------------------------

    Old-skool LEFT JOIN - IS NULL (MAXDOP 1)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2107, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#tmp2_______________________________________________________________________________________________________________000000000049'. Scan count 1, logical reads 1502, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1076 ms, elapsed time = 1076 ms.

    ------------------------------

    NOT EXISTS (MAXDOP 1)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2107, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#tmp3_______________________________________________________________________________________________________________00000000004A'. Scan count 1, logical reads 1502, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 936 ms, elapsed time = 932 ms.

    ------------------------------

    EASY VERSION EXCEPT (MAXDOP 1)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2107, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#tmp________________________________________________________________________________________________________________000000000048'. Scan count 1, logical reads 1502, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1279 ms, elapsed time = 1349 ms.

    ------------------------------

    SUBQUERY EXCEPT (MAXDOP 1)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 2, logical reads 4216, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#tmp4_______________________________________________________________________________________________________________00000000004B'. Scan count 1, logical reads 1502, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1170 ms, elapsed time = 1188 ms.

    ------------------------------

    AFTER INDEX GENERATION ON #tmps

    ------------------------------

    Old-skool LEFT JOIN - IS NULL (MAXDOP 1)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table '#tmp2_______________________________________________________________________________________________________________000000000049'. Scan count 1, logical reads 293900, physical reads 0, read-ahead reads 1393, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 1, logical reads 134372, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2108, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1186 ms, elapsed time = 2274 ms.

    ------------------------------

    NOT EXISTS (MAXDOP 1)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table '#tmp3_______________________________________________________________________________________________________________00000000004A'. Scan count 1, logical reads 293900, physical reads 0, read-ahead reads 1658, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 1, logical reads 134372, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2108, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1092 ms, elapsed time = 1986 ms.

    ------------------------------

    EASY VERSION EXCEPT (MAXDOP 1)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table '#tmp________________________________________________________________________________________________________________000000000048'. Scan count 1, logical reads 293313, physical reads 0, read-ahead reads 1652, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2107, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1950 ms, elapsed time = 3170 ms.

    ------------------------------

    SUBQUERY EXCEPT (MAXDOP 1)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table '#tmp4_______________________________________________________________________________________________________________00000000004B'. Scan count 1, logical reads 293900, physical reads 0, read-ahead reads 1021, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 1, logical reads 134372, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 2, logical reads 4216, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1217 ms, elapsed time = 1953 ms.

    ------------------------------

    INDEXES DROPPED on #tmps, SomeValue column removed

    ------------------------------

    Old-skool LEFT JOIN - IS NULL (MAXDOP 1)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2107, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#tmp2_______________________________________________________________________________________________________________000000000049'. Scan count 1, logical reads 1501, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1060 ms, elapsed time = 1063 ms.

    ------------------------------

    NOT EXISTS (MAXDOP 1)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2107, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#tmp3_______________________________________________________________________________________________________________00000000004A'. Scan count 1, logical reads 1501, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 920 ms, elapsed time = 926 ms.

    ------------------------------

    EASY VERSION EXCEPT (MAXDOP 1)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 1, logical reads 2107, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#tmp________________________________________________________________________________________________________________000000000048'. Scan count 1, logical reads 1501, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 920 ms, elapsed time = 929 ms.

    ------------------------------

    SUBQUERY EXCEPT (MAXDOP 1)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#blah_______________________________________________________________________________________________________________000000000047'. Scan count 2, logical reads 4216, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#tmp4_______________________________________________________________________________________________________________00000000004B'. Scan count 1, logical reads 1501, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1139 ms, elapsed time = 1170 ms.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I haven't used Except yet, and I seldom use Not Exists. But now, I'm going to try and see what I come up with.

    You're probably crazy, but that doesn't mean you're wrong. @=) I'll let you know what I find.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/

    I'll ping in Gail to see if she wants to blog more about this.

  • Are you always using MAXDOP 1 in your queries?

    You don't trust MS at all are you? 😀

    It used to be that LEFT JOIN with check for NULL was always the fasterst option over NOT EXISTS, however recently i've also found it's not the case. Now, sometime, they perform with the same. But I wouldn't advice to introduce the "iron rule" to always use NOT EXISTS.

    I had the case recently where replacing it with LEFT JOIN reduced the query execution time significantly.

    As most things in SQL are: It all depends...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (7/27/2011)


    Are you always using MAXDOP 1 in your queries?

    You don't trust MS at all are you? 😀

    Not for benchmark tests, no. Besides, if you'll see my notes, I did try the paralleled versions and in this test case they ran slower generically.

    It used to be that LEFT JOIN with check for NULL was always the fasterst option over NOT EXISTS, however recently i've also found it's not the case. Now, sometime, they perform with the same. But I wouldn't advice to introduce the "iron rule" to always use NOT EXISTS.

    I had the case recently where replacing it with LEFT JOIN reduced the query execution time significantly.

    As most things in SQL are: It all depends...

    Of course, it was more about the EXCEPT being... well... slower overall. It's the newfangled optimal method, afterall.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (7/26/2011)


    I'm hoping someone can confirm my findings here, because I understood EXCEPT to be a higher optimized method of finding Anti-semi Joins.

    There's nothing special about EXCEPT or INTERSECT. EXCEPT is TSQL shorthand for a DISTINCT on a NOT EXISTS, and INTERSECT is shorthand for DISTINCT on an EXISTS.

    TSQL is not a programming language, for the most part. There are extensions for control-of-flow and things like that, but the core of SQL is a declarative, or specification language. You specify what you want, and the query processor writes a program to meet that specification.

    In principle, whether you write semantically-equivalent query specifications using JOIN, EXISTS, or EXCEPT should not matter. In practice, not every possible transformation between semantically identical constructions is considered, so the written form of the query can have an effect on the final program (query plan).

    The general recommendation is to start off by writing a query specification using the most natural syntax. If you want to join rows, use (OUTER) JOIN. If you want to check that a match exists (or not) in another table, use (NOT) EXISTS. If you want distinct items from one set that do (or do not) appear in another set, use EXCEPT or INTERSECT.

    If the program SQL Server comes up with (some of which you can see in the graphical or XML show plan output) is not optimal, ensure you have good estimates (statistics),appropriate indexing, and a relational design. If the plan is suboptimal because of limitations in SQL Server's model, or its ability to reason about alternatives, consider alternative forms of query syntax. Ultimately, you can also encourage the engine to come up with a better plan by the use of hints or other advanced techniques.

    It is quite wrong to compare the 'performance' of JOIN versus EXISTS and so on - the only thing you can meaningfully compare is (what you can see of) the program written by SQL Server to meet the specification you gave.

    As an aside, the queries in your test rig have different semantics - they will produce different results in some cases. For example, the OUTER JOIN has different behaviour regarding duplicates in the preserved side compared with EXISTS. In general, different query syntax tends to have (often subtle) differences in meaning, particularly with respect to NULLs and cardinality, and some logical constructions (like EXISTS) can logically stop searching as soon as a match is found. Likewise, a NOT EXISTS can stop searching as soon as it does find a match.

    I don't really expect anything I say to prevent people debating whether JOINs are inherently 'faster' than EXISTS or not (sigh).

    Last thing - please be careful to specify NOT NULL or NULL on your columns. I have no idea if you intended SomeValue to be NULL in the #blah table for FabricatedIDs higher than 255, but they are there. On a SQL Server with a different effective setting for the ANSI NULL default options, your script would produce an error.

  • Paul,

    Thank you for that post. It was very informative.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • "If the program SQL Server comes up with (some of which you can see in the graphical or XML show plan output) is not optimal, ensure you have good estimates (statistics),appropriate indexing, and a relational design. If the plan is suboptimal because of limitations in SQL Server's model, or its ability to reason about alternatives, consider alternative forms of query syntax. Ultimately, you can also encourage the engine to come up with a better plan by the use of hints or other advanced techniques."

    Well stated; this one paragraph pretty much says it all, no only about this particular subject but most other query performance issues period.

    The probability of survival is inversely proportional to the angle of arrival.

  • Paul,

    I found your explanation quite informative. I hadn't actually thought INTERSECT and EXCEPT through to the DISTINCT part. When you look at it that way, then it is likely to not perform quite as well as (NOT) EXISTS.

    I don't use EXCEPT much except in update triggers to determine if something actually changed or not. Something on the order of:

    SELECT ItemNO, Price

    INTO #SomeTemp

    FROM inserted

    EXCEPT

    SELECT ItemNo, Price

    FROM deleted

    I haven't noticed any performance issues with this, but the Item table updates usually involve only a few records.

    I'll have to give this a think for future use in case I have to deal with a huge number of updates at once.

    Todd Fifield

  • tfifield (7/29/2011)


    I hadn't actually thought INTERSECT and EXCEPT through to the DISTINCT part. When you look at it that way, then it is likely to not perform quite as well as (NOT) EXISTS.

    Hi Todd,

    It depends whether the logical DISTINCT can be optimized away, for example if there is a uniqueness constraint. In your example, perhaps ItemNo is enforced to be unique, so there would likely be no sort distinct or grouping aggregate needed in the query plan. In any case, if performance is good with the natural EXCEPT syntax, it's all good!

  • SQLkiwi (7/29/2011)


    tfifield (7/29/2011)


    I hadn't actually thought INTERSECT and EXCEPT through to the DISTINCT part. When you look at it that way, then it is likely to not perform quite as well as (NOT) EXISTS.

    Hi Todd,

    It depends whether the logical DISTINCT can be optimized away, for example if there is a uniqueness constraint. In your example, perhaps ItemNo is enforced to be unique, so there would likely be no sort distinct or grouping aggregate needed in the query plan. In any case, if performance is good with the natural EXCEPT syntax, it's all good!

    Paul,

    Another good point on unique constraints. I do think about them when designing tables in order to get better query plans. I really have to think these things through a bit better when choosing which query operator to use.

    Todd Fifield

  • There's another thing to consider. IIRC, both INTERSECT and EXCEPT essentially do a NULL = NULL comparison without any special considerations. If such a thing is required by the nature of the data, INTERSECT and EXCEPT could end beating things like INNER JOIN with NULL handling.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/30/2011)


    There's another thing to consider. IIRC, both INTERSECT and EXCEPT essentially do a NULL = NULL comparison without any special considerations. If such a thing is required by the nature of the data, INTERSECT and EXCEPT could end beating things like INNER JOIN with NULL handling.

    Yep.

    http://sqlblog.com/blogs/paul_white/archive/2011/06/22/undocumented-query-plans-equality-comparisons.aspx

  • I wanted to thank everyone (particularly Paul) for helping me find, or at least learn what to go research, in regards to my confusion. I'm sorry I haven't been particularly active but I did want to thank you.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply