LEFT JOIN vs EXCEPT

  • Christopher Stobbs (8/1/2008)


    Jeff,

    How would you best two queries to see which is performing better?

    thanks

    Chris

    One way is already demonstrated in the code I posted previously in this thread. Another way is to set up Profiler for the SPID I'm using to test through and have each code snippet separated from the other using GO.

    Of course, you can't test something for performance unless you have lot's of data, so I use a "standard" test table for certain things. The code to generate the "standard" test table can be easily modified to suit a wide variety of requirements. The code generator is based on the same principle as the code I use to very quickly generate a Tally table. Here it is...

    DROP TABLE JBMTest

    GO

    --===== Create and populate a 1,000,000 row test table.

    -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

    -- for all rows.

    -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

    -- Jeff Moden

    SELECT TOP 1000000

    RowNum = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex12 = RIGHT(NEWID(),12)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    -- Takes about 1 second to execute.

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (RowNum)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/1/2008)


    THAT was a lot of fun! Barry is a heck of a good sport!

    Likewise, Jeff!

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • It is interesting that the result of EXCEPT may be different than a NOT EXISTS.

    EXCEPT considers NULLS to equal but NOT EXISTS considers NULLS to be unequal. Here is a reproduction:

    After populating JBMTest per the provide SQL, add some additional rows with null.

    INSERT INTO dbo.JBMTest

    (SomeLetters2)

    SELECT TOP 1000

    CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2

    EXCEPT returns zero rows:

    select *

    from dbo.JBMTest

    except

    select *

    from dbo.JBMTest

    NOT EXISTS returns all of the rows with NULLs

    select *

    from dbo.JBMTest as T1

    where NOT EXISTS

    (select 1

    fromdbo.JBMTest as T2

    whereT2.RowNum= T1.RowNum

    andT2.SomeInt= T1.SomeInt

    andT2.SomeCSV= T1.SomeCSV

    andT2.SomeMoney= T1.SomeMoney

    andT2.SomeDate= T1.SomeDate

    andT2.SomeHex12= T1.SomeHex12

    )

    SQL = Scarcely Qualifies as a Language

  • Good point, Carl. Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'd like to add something here... I think a few people have touched on it, but it's pretty important.

    If you are pulling out a very small amount of records, say, just one or two, then using Except would probably be the better way to go, if you can filter to this level in the except part of your clause.

    If you are, however, pulling out a rather large set of data, the left outer joins will be the better choice.

    At least, that's what I generally find with nested queries rather than joined queries.

    If you can filter down to only a few records early in the game, then nesting is good. Otherwise, it makes your DB take too much of a time hit to query all the records twice, rather than things like hash matches.

  • Regarding using LEFT OUTER JOIN instead of EXCEPT or NOT EXISTS, how do you write such a SQL Statement ? Please use the AdventureWorks demo database to answer the following question: What Vendors do NOT supply Products whose color is Blue or Grey?

    EXCEPT SQL

    select*

    fromPurchasing.Vendor

    JOIN(SELECT Purchasing.Vendor.VendorID

    FROMPurchasing.Vendor

    EXCEPT

    (Select Purchasing.ProductVendor.VendorID

    FROMProduction.Product

    JOINPurchasing.ProductVendor

    ONProduction.Product.ProductID= Purchasing.ProductVendor.ProductID

    WHEREProduction.Product.color in ( 'Blue','Grey')

    )

    ) as VendorNonColor

    on VendorNonColor.VendorID = Purchasing.Vendor.VendorID

    ;

    NOT EXISTS SQL:

    select *

    fromPurchasing.Vendor

    whereNOT EXISTS

    (Select 1

    FROMProduction.Product

    JOINPurchasing.ProductVendor

    ONProduction.Product.ProductID= Purchasing.ProductVendor.ProductID

    WHEREProduction.Product.colorin ( 'Blue','Grey')

    ANDPurchasing.ProductVendor.VendorID = Purchasing.Vendor.VendorID

    )

    The schema:

    CREATE TABLE [Purchasing].[Vendor]

    ([VendorID] [int] IDENTITY(1,1) NOT NULL

    ,[Name] varchar(255) NOT NULL

    , CONSTRAINT [PK_Vendor_VendorID] PRIMARY KEY CLUSTERED

    ([VendorID] ASC)

    )

    CREATE TABLE [Production].[Product]

    ([ProductID] [int] IDENTITY(1,1) NOT NULL

    ,[Color] [nvarchar](15) NULL

    ,CONSTRAINT [PK_Product_ProductID] PRIMARY KEY CLUSTERED ([ProductID] ASC)

    )

    CREATE TABLE [Purchasing].[ProductVendor]

    ([ProductID] [int] NOT NULL

    ,[VendorID] [int] NOT NULL

    ,CONSTRAINT [PK_ProductVendor_ProductID_VendorID] PRIMARY KEY CLUSTERED

    ([ProductID] ASC,[VendorID] ASC)

    ,CONSTRAINT [FK_ProductVendor_Product_ProductID] FOREIGN KEY([ProductID])

    REFERENCES [Production].[Product] ([ProductID])

    ,CONSTRAINT [FK_ProductVendor_Vendor_VendorID] FOREIGN KEY([VendorID])

    REFERENCES [Purchasing].[Vendor] ([VendorID])

    )

    GO

    CREATE NONCLUSTERED INDEX [IX_ProductVendor_VendorID] ON [Purchasing].[ProductVendor] ([VendorID] ASC)

    go

    SQL = Scarcely Qualifies as a Language

  • Can you use EXCEPT ALL like you can with UNION ALL to display even the non-distinct rows?

  • J Frizzle (12/21/2012)


    Can you use EXCEPT ALL like you can with UNION ALL to display even the non-distinct rows?

    I'm pretty sure you cannot. Neither can use use INTERSECT ALL.

    INTERSECT I'm pretty sure also returns only DISTINCT values.

    Thus UNION, INTERSECT and EXCEPT all operate in a similar fashion, returning only DISTINCT values.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply