Derived tables vs Normal table joins

  • Hello,

    I have been having a discussion with a colleague regarding different approaches to writing SQL. My colleague argues that it is more efficient to use derived tables rather than joining two tables say and filtering with a Where clause. For example, to query table of orders base on the columns values in linked orderline rows normal I would use this format:

    select top

    100 o.*

    from

    EcsPurOrder o

    join ecspurextlink ol on o.id = ol.to_order

    where

    o.id in(5465,6855)

    order by

    o.id desc

    However, using derived tables we could use this format:

    select top

    100 o.*

    from

    EcsPurOrder o

    join

    (select ol.to_order from ecspurextlink ol where ol.to_order in(5465,6855)) ol

    on o.id = ol.to_order

    order by

    o.id desc

    I would always argue in favour of the first method if only in terms or readability and assumed there couldn't be much difference, however, when checking the stats, the first method results in two scans;

    Stats from normal join and where clause

    Table 'ext_link'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'order_'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    vs

    Stats using derived table sub query

    Table 'order_'. Scan count 0, logical reads 69, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'ext_link'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    So as I read this, it looks like the derived table generates one less table scan.

    However, digging deeper, the execution plans look pretty much the same:

    Plan using normal join table

    https://drive.google.com/file/d/0B_g8mOkyPeZfRE83Nk1BbXFWajg/edit?usp=sharing

    Plan using derived table

    https://drive.google.com/file/d/0B_g8mOkyPeZfWjhtNVBobzNNU2c/edit?usp=sharing

    Confused :crazy:

  • ... My colleague argues that it is more efficient to use derived tables rather than joining two tables say and filtering with a Where clause. ...

    You can argue to the death. in reallity it is not possible to generalise on this subject. Sometimes you will se better performance with first approach, sometimes with the second. It depends on many more factors other then using sub-query or not.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • wilks (7/1/2014)


    I would always argue in favour of the first method if only in terms or readability and assumed there couldn't be much difference, however, when checking the stats, the first method results in two scans;

    Stats from normal join and where clause

    Table 'ext_link'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'order_'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    vs

    Stats using derived table sub query

    Table 'order_'. Scan count 0, logical reads 69, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'ext_link'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    So as I read this, it looks like the derived table generates one less table scan.

    Did you clean the buffer between both queries? The second one has a lot more logical reads, which mean the data is in cache.

    The execution plans indicate that SQL Server engine is processing both queries the same way and the statistics indicate that the data is taken from different places (disk vs memory).

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Scan count != number of table scans. Rather ignore the scan count, it's not consistent.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • the second query might have fewer scans, but it also has more reads, 75 vs. 12, so the first query appears to be doing less IO. Both are doing logical reads so neither is actually going to disk in this case.

    The queries aren't exactly the same because the predicate is going against to_order in the second query and order.id in the first. This caused the optimizer to change the access order in the plan. In Query 1 the the Clustered Index Seek against EcsPurOrder is done first then passing those results to the Nest Loop join to get the information needs from ecspurextlink. Since the Clustered Index is also the PK, thus unique, it gets 2 rows so ecspurextlink is accessed 2 times. In the second query the index seek on ecspurextlink is done first which means it has to read more pages in because the loop is being executed once for each of the rows returned when seekign ecspurextlink. Since that isn't a unique index it could be thousands of rows.

    I'd write the query you have this way because you aren't returning any data from ecspurextlink.

    SELECT TOP 100

    o.*

    FROM

    EcsPurOrder

    WHERE

    EXISTS ( SELECT

    1

    FROM

    ecspurextlink ol

    WHERE

    o.id = ol.to_order ) AND

    o.id IN (5465, 6855)

    ORDER BY

    o.id DESC

    Logically you are only trying to return orders that have details and aren't returning any detail information. I bet it might perform even better.

    Lastly, I would want to have my predicate be against the index that is unique so I'd never do what your colleague did. If I was doing a derived table I'd do it either like this:

    SELECT TOP 100

    o.*

    FROM

    EcsPurOrder o

    JOIN (

    SELECT

    ol.to_order

    FROM

    ecspurextlink ol

    ) ol

    ON o.id = ol.to_order

    WHERE

    o.id IN (5465, 6855)

    ORDER BY

    o.id DESC

    or using cross apply like this:

    SELECT TOP 100

    o.*

    FROM

    EcsPurOrder o

    CROSS APPLY (

    SELECT

    ol.to_order

    FROM

    ecspurextlink ol

    WHERE

    o.id = ol.to_order

    ) ol

    WHERE

    o.id IN (5465, 6855)

    ORDER BY

    o.id DESC

    both of which will likely produce the same execution plan as your join operation.

  • In the OP case, there are many ways to write the query to return the same results, eg. using temp tables for intermidiate results...

    Again, it's impossible to generalise the best way based on such limitted given details. So many factors invloved, eg. indexes, table sizes, dataypes and so on and so on...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Thank you all for those excellent replies, I had misunderstood the scan count from IO statistics, I see now it can refer to an index or table scan.

    I agree that it would make much more sense to do the seek using the PK on the Order table, but I was trying to make the DB do more work.

    Crucially, as Jack pointed out, my two queries were using different fields in the where clause so I was comparing apples and pears. Having fixed that, both queries perform equally. Although the other suggestion by Jack using the cross apply seems to perform best, but scores less on the visibility front in my opinion, but that may just be familiarity.

    DBCC DROPCLEANBUFFERS

    select

    o.*

    from

    EcsPurOrder o

    join ecspurextlink ol on o.id = ol.to_order

    where

    ol.to_order in(5465,6855)

    order by

    o.id desc

    (23 row(s) affected)

    Table 'order_'. Scan count 0, logical reads 69, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'ext_link'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    select

    o.*

    from

    EcsPurOrder o

    join

    (select ol.to_order from ecspurextlink ol where ol.to_order in(5465,6855)) ol

    on o.id = ol.to_order

    order by

    o.id desc

    (23 row(s) affected)

    Table 'order_'. Scan count 0, logical reads 69, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'ext_link'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- using sub-query suggested by Jack - note returns different number of results

    SELECT

    o.*

    FROM

    EcsPurOrder o

    WHERE

    EXISTS ( SELECT

    1

    FROM

    ecspurextlink ol

    WHERE

    o.id = ol.to_order ) AND

    o.id IN (5465, 6855)

    ORDER BY

    o.id DESC

    (2 row(s) affected)

    Table 'ext_link'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'order_'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    -- using cross apply suggested by Jack

    SELECT TOP 100

    o.*

    FROM

    EcsPurOrder o

    CROSS APPLY (

    SELECT

    ol.to_order

    FROM

    ecspurextlink ol

    WHERE

    o.id = ol.to_order

    ) ol

    WHERE

    o.id IN (5465, 6855)

    ORDER BY

    o.id DESC

    (23 row(s) affected)

    Table 'ext_link'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'order_'. Scan count 2, logical reads 6, physical reads 4, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Wilks

  • wilks (7/2/2014)


    I had misunderstood the scan count from IO statistics, I see now it can refer to an index or table scan.

    No, it doesn't. It is not a count of the number of scans done. Rather just ignore that value and concentrate on the logical reads.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (7/2/2014)


    wilks (7/2/2014)


    I had misunderstood the scan count from IO statistics, I see now it can refer to an index or table scan.

    No, it doesn't. It is not a count of the number of scans done. Rather just ignore that value and concentrate on the logical reads.

    And average execution time. While execution time varies wildly due to resource contention, you still want to get an idea of how long it runs. Just measuring reads doesn't always tell you which query is running faster.

    (and yeah, I know you know this Gail, just adding to your answer)

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Back to your basic argument, I'm with you. Using the language in the standard mechanism by default is usually the better approach. There are any number of valid reasons to use derived tables, but using them in place of a perfectly straight-forward JOIN is not the way to go. Now, you really want to mess with your co-worker, ask them why not use CROSS APPLY?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • It makes sense that the EXISTS query would return fewer rows. When using JOIN or CROSS APPLY you are going to get the order information for each order detail row. When using EXISTS you will only get the distinct order

    information. I'd bet if you did SELECT DISTINCT with JOIN and CROSS APPLY queries you'd only get 2 rows as well.

    I'd argue that the way the example query is written you really only want/need the 2 rows returned anyway.

  • Yep, looks like a good answer to me.

    Thanks!

    Wilks:-D

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply