Right Anti Semi Join Vs Left outer join

  • I'm looking at a poor performing stored proc and I see a "where not exists in (select...)", So I rewrite it as an outer join.

    But there is no change in performance.

    Is that expected behaviour?

    It is not what I was expecting, but I'm often wrong and that's why I test ☺

    here is the code I'm using

    prep the tables

    if exists (select Name from sys.tables where object_id = object_id('MyTableOne'))

    begin

    drop table MyTableOne

    end

    ;

    ;WITH

    E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), -- 1*10^1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*10^4 or 10,000 rows

    E8(N) AS (SELECT 1 FROM E4 a, E4 b) -- 1*10^8 or 100,000,000 rows

    SELECT TOP (1000000)

    RowID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    ,RandomID = newid()

    ,RandomDate =getdate() + checksum(NEWID())%365

    ,RandomLetter = CHAR(65 + abs(checksum(NEWID()))%26)

    into MyTableOne

    FROM E8

    go

    alter table MyTableOne

    alter column RowID int not null

    go

    ALTER TABLE [dbo].MyTableOne ADD CONSTRAINT [PK_MyTableOne] PRIMARY KEY CLUSTERED

    (

    RowID ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [IDX_MyTableOne] ON [dbo].[MyTableOne]

    (

    RowID ASC, RandomLetter ASC

    )

    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

    GO

    -- **********************

    if exists (select Name from sys.tables where object_id = object_id('MyTableTwo'))

    begin

    drop table MyTableTwo

    end

    ;

    ;WITH

    E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), -- 1*10^1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*10^4 or 10,000 rows

    E8(N) AS (SELECT 1 FROM E4 a, E4 b) -- 1*10^8 or 100,000,000 rows

    SELECT TOP (1000000)

    RowID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    ,RandomID = newid()

    ,RandomDate =getdate() + checksum(NEWID())%365

    ,RandomLetter = CHAR(65 + abs(checksum(NEWID()))%26)

    into MyTableTwo

    FROM E8

    go

    alter table MyTableTwo

    alter column RowID int not null

    go

    ALTER TABLE [dbo].MyTableTwo ADD CONSTRAINT [PK_MyTableTwo] PRIMARY KEY CLUSTERED

    (

    RowID ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [IDX_MyTableTwo] ON [dbo].[MyTableTwo]

    (

    RowID ASC, RandomLetter ASC

    )

    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

    GO

    Here is the test Please note I flush the proc cache, so carefull where you run this!

    set language british

    go

    -- Flush the procedure cache for one database only

    DECLARE @intDBID INT;

    SET @intDBID = (SELECT [dbid]

    FROM master.dbo.sysdatabases

    WHERE name = DB_NAME());

    DBCC FLUSHPROCINDB (@intDBID);

    set statistics time on;

    set statistics io on;

    select randomdate,RandomLetter

    from MyTableOne T1

    WHERE NOT EXISTS ( SELECT *

    FROM MyTableTwo T2

    where T1.RowID = T2.RowID

    and T1.RandomLetter = T2.RandomLetter )

    -- Flush the procedure cache for one database only

    DBCC FLUSHPROCINDB (@intDBID);

    select T1.randomdate,T1.RandomLetter

    from MyTableOne T1

    left outer join MyTableTwo T2

    on T1.RowID = T2.RowID

    and T1.RandomLetter = T2.RandomLetter

    where T2.RowID is null

    set statistics time off;

    set statistics io off;

    and my results are

    Changed language setting to British.

    (1 row(s) affected)

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    (961577 row(s) affected)

    Table 'MyTableTwo'. Scan count 17, logical reads 1580, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'MyTableOne'. Scan count 17, logical reads 5295, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 2370 ms, elapsed time = 5653 ms.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 4 ms.

    (961577 row(s) affected)

    Table 'MyTableTwo'. Scan count 17, logical reads 1580, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'MyTableOne'. Scan count 17, logical reads 5295, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 2952 ms, elapsed time = 5325 ms.

    Thank you for the time to read this,

    The cross joins I used to populate the tables is based on a script by Itzik Ben-Gan.

  • Ian_McCann (12/9/2014)


    I'm looking at a poor performing stored proc and I see a "where not exists in (select...)", So I rewrite it as an outer join.

    But there is no change in performance.

    Is that expected behaviour?

    No. The outer joins are usually very slightly slower than NOT EXISTS. Not much, just a bit.

    http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/

    http://sqlinthewild.co.za/index.php/2010/04/27/in-exists-and-join-a-roundup/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Just to clarify, I thought the sub select would perform badly because it references the outer select and some kind of recursion would happen.

  • Except if you look at the execution plan, there's no 'recursion' anywhere and correlated subqueries are not automatically horrible things.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thank you very much for your answer Gail.

    ( I've got no idea what recursion looks like in an exection plan, I 'll go away and build one :-))

  • btw, you say there's no change in performance in your tests, but there is.

    Ian_McCann (12/9/2014)


    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    (961577 row(s) affected)

    Table 'MyTableTwo'. Scan count 17, logical reads 1580, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'MyTableOne'. Scan count 17, logical reads 5295, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 2370 ms, elapsed time = 5653 ms.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 4 ms.

    (961577 row(s) affected)

    Table 'MyTableTwo'. Scan count 17, logical reads 1580, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'MyTableOne'. Scan count 17, logical reads 5295, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 2952 ms, elapsed time = 5325 ms.

    2.3 seconds CPU time for the NOT EXISTS, 2.9 seconds CPU time for the join.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You are absolutly correct Gail.

    In the example I've posted that is a significant difference.

    In the proc that I'm tuning the duration is 50 minutes and the difference is less signficant.

    On the plus side, that did give me 50 minutes to read through your links

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply