Joins versus subqueries

  • Hi all!  We are once again battling the age old question of performance: the use of Joins versus subqueries.  I have tried to look online for an answer, and rather than purchasing yet another T-SQL How To book, I thought I would throw this one out to actual users.  We are in a debate over whether or not to use a subquery in the WHERE clause or create a JOIN.  Any random thoughts (as it pertains to the question would be most helpful!

    Thanks again! -Marti

     

  • I like the derived table/join approach... it gives you room to add as many columns as you need down the line, and it's more often than not faster than the subquery (never seen a subquery faster than the join, only equal).

  • While there are exceptions, the general rule would be to use a join against either a table or a derived table. It's almost always going to be faster becuase of how the query optimizer works. That said, your mileage may vary, caveat emptor, test, test, test.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Joins ARE almost always faster to execute than subqueries.

    Why use subqueries? If your code will be read or modified in the future, it is often easier to decipher, and understand the logic, of a subquery.

  • And as I said it's easier to maintain .

  • I'm still very much using the where clause over joins.

    The only reason is that there's still no proof that joins are faster than where clause.

    It all comes down to how you write your query.

    It would be great if someone can provide two queries, one using joins and the other using where clause and we examine together?

     

    Personally I still find where/subqueries are much readable than joins.

    Unless it's proven otherwise I just stick to the where clause

     

  • Remi, you always got solutions in hands. Well I'm trying to catche to you.

    here you go:

    http://geekswithblogs.net/johnsperfblog/archive/2005/05/26/40646.aspx

  • It's always more of the same... test all the versions you think may work and see which is fastest.

  • I  have created two queries on Northwind database, one using Join and the other using subquery.

    QUERY A:

    select customers.*

    from customers

    left join orders on  orders.customerid = customers.customerid

    where orders.customerid is NULL

    QUERY B:

    select c.*

    from customers c

    where not exists (select 1 from orders where customerid = c.customerid)

     

    ---------------TYPICAL SHOW PLAN RESULTS-----------

    QUERY A:

      |--Filter(WHERE[Orders].[CustomerID]=NULL))

           |--Nested Loops(Left Outer Join, OUTER REFERENCES[Customers].[CustomerID]) WITH PREFETCH)

                |--Clustered Index Scan(OBJECT[Northwind].[dbo].[Customers].[PK_Customers]))

                |--Index Seek(OBJECT[Northwind].[dbo].[Orders].[CustomerID]), SEEK[Orders].[CustomerID]=[Customers].[CustomerID]) ORDERED FORWARD)

    QUERY B:

      |--Nested Loops(Left Anti Semi Join, OUTER REFERENCES[c].[CustomerID]) WITH PREFETCH)

           |--Clustered Index Scan(OBJECT[Northwind].[dbo].[Customers].[PK_Customers] AS [c]))

           |--Index Seek(OBJECT[Northwind].[dbo].[Orders].[CustomerID]), SEEK[Orders].[CustomerID]=[c].[CustomerID]) ORDERED FORWARD)

     

     

    IF QUERY A and B are run in one batch:

    QUERY A - query cost = 52.06%

    QUERY B - query cost = 47.94%

     

    In this case can you still say that joins are faster than subqueries???

     

     

  • What happens if you again run query A and B in one batch, but this time query B before query A?

  • You compared exists to left join.

    I compared inner join to in. In and exists are not the same type of operators so they can't really be compared. Try the second query like this :

    select c.*

    from dbo.customers c

    where customerid not in (select customerid from dbo.orders)

  • I remember reading somewhere that "IN" subqueries were very close to joins in speed, UNLESS the IN operater references a list of comma separated values.  If you are running tests, I think you need to compare value lists to subqueries to joins.

    Regardless, I find subqueries or value lists to be the most readable, so I prefer them if query efficiency is not an issue. 

    Sometimes, its a real headache to work around value lists (with temp tables and all), so I will use them as dynamic  SQL [slap, slap] until I have time to recode the beast.

  • For 2 :

    IF Object_id('fnSplit_Set') > 0

    DROP FUNCTION dbo.fnSplit_Set

    GO

    IF Object_id('Numbers') > 0

    DROP TABLE dbo.Numbers

    GO

    CREATE TABLE dbo.Numbers (PkNumber int identity(1,1), dude bit null, CONSTRAINT Pk_Number PRIMARY KEY CLUSTERED (PkNumber))

    GO

    INSERT INTO dbo.Numbers (dude)

    SELECT NULL FROM (SELECT TOP 100 NULL AS A FROM master.dbo.spt_values) dt100 cross join (SELECT TOP 80 null AS A FROM master.dbo.spt_values) dt80

    GO

    ALTER TABLE dbo.Numbers

    DROP COLUMN dude

    GO

    --Select min(PkNumber) as MinA, max(PkNumber) as MaxA from dbo.Numbers

    --1, 8000

    GO

    CREATE FUNCTION [dbo].[fnSplit_Set] (@IDS as varchar(8000), @vcDelimiter varchar(3))

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    Return

    Select dtSplitted.EachID, dtSplitted.Rank from (

    SELECT SUBSTRING(@vcDelimiter + @IDs + @vcDelimiter, N.PkNumber + len(@vcDelimiter),

    CHARINDEX(@vcDelimiter, @vcDelimiter + @IDs + @vcDelimiter, N.PkNumber + len(@vcDelimiter)) - N.PkNumber - len(@vcDelimiter)) as EachID

    , (LEN(SUBSTRING(@IDs, 1, N.PkNumber)) - LEN (REPLACE (SUBSTRING(@IDs, 1, N.PkNumber), ',', ''))) + 1 AS Rank

    FROM dbo.Numbers N

    WHERE SUBSTRING(@vcDelimiter + @IDs + @vcDelimiter, N.PkNumber, len(@vcDelimiter)) = @vcDelimiter

    AND PkNumber 0

    GO

    DECLARE @Ids as varchar(8000)

    SET @IDS = ''

    Select @IDS = @IDS + CAST(id as varchar(10)) + ',' from (Select TOP 10 id from dbo.SysObjects order by NEWID()) dtIds

    SET @IDS = left(@IDS, ABS(len(@IDS) - 1))

    PRINT @IDS

    Select O.id, dtIDS.EachId, O.name, dtIDS.Rank from dbo.SysObjects O inner join (Select CAST(EachID as INT) AS EachID, Rank from dbo.fnSplit_Set (@IDS, ',')) dtIDS on O.id = dtIDS.EachID order by O.Name

    Select O.id, dtIDS.EachId, O.name, dtIDS.Rank from dbo.SysObjects O inner join (Select CAST(EachID as INT) AS EachID, Rank from dbo.fnSplit_Set (@IDS, ',')) dtIDS on O.id = dtIDS.EachID order by dtIDS.Rank

    --Don't ever forget to cast the eachid column to the same datatype as in the join to avoid table scans.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply