SQL - Derived Tables

  • Comments posted to this topic are about the item SQL - Derived Tables

    Thanks,
    Prashant

  • I try to use derived tables whenever possible, but I recently improved performance by changing a derived table to a temp table. It was large and joined on a non-index column so I found that a temp table with an index on the join fields.

  • Great article.

    We have been using the derived table as a great complex query design tool:

    Break the complex query into derived tables, generally by the specific table's restrictions.

    Then JOIN the individual derived table queries into the complex query.

    BTW here id the fix for the original query:

    SELECT C.CustomerID, C.CompanyName

    , COUNT(O.OrderID) AS TotalOrders

    FROM Customers C

    LEFT OUTER JOIN Orders O ON C.CustomerID = O.CustomerID

    AND O.OrderDate >= '19960101'

    AND O.OrderDate < '19970101'

    GROUP BY C.CustomerID, C.CompanyName

    The last step is to evaluate for optimization.

    A suggestion is no try and not use functions in the where clause as that can cause any available index to be skipped.

    Andy

  • "Theta Joins" also can be used.

    For the given example in the article, the results should match for this query:

    select c.*, o.*

    from customers as c

    left join orders as o

    on c.customerid = o.customerid and year(o.orderdate) = 1996

    --where

    -- ...

  • ...and everything happens in memory instead of a combination of memory and disk.

    ...while a temporary table exists until the server is brought down and also it uses lot of disk space than a derived table in the temdb database.

    You wrote a very good introduction to the use of Derived Tables... but you really need to check your resources on the two statements above....

    First, Derived Tables are NOT memory only constructs... just like any query, if they run out of room in memory, they WILL make a thing called a "Working" table and you can frequently see them in the text version of execution plans. Guess where those bad boys live... you guessed it... memory if they fit and if they don't, TEMPDB!

    Also, your information about Temp Tables living only on disk is dead wrong. Please read the following Microsoft provided URL...

    http://support.microsoft.com/default.aspx?scid=kb;en-us;305977&Product=sql2k

    ... and pay particular attention to where it states "If memory is available, both table variables and temporary tables are created and processed while in memory (data cache). "

    You also need to check Books Online for what the persistance of a Temp Table actually is... you DON'T need to bring the server down to get rid of a Temp Table. 😉

    Don't let this dampen your writing spirit... you provided a really good intro to derived tables... I just had to correct your statements about temp tables. In fact, there are some occasions where a Temp Table will blow the doors off of derived tables... usually in a multi-table-joined update when aggragates are necessary. They work pretty well as a substitute for CTE's, too!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Derived tables are a great performance optimisation tool but always check because the derived table approach is not always the best and temporary tables are sometimes a necessary approach.

    The derived table itself is just expanded into the main query (just like a view) this means that there are no statistics available for the derived table.

    Statistics are held for the temporary table; you can also create indexes on temporary tables.

    I use derived tables a lot, well, one hell of a lot to be honest and they are a good structure but also look at CTE's which is what I'm using more - they can be referenced more like a virtual table in your queries.

    Tony.

  • David A. Long (1/16/2008)


    Great article.

    BTW here id the fix for the original query:

    SELECT C.CustomerID, C.CompanyName

    , COUNT(O.OrderID) AS TotalOrders

    FROM Customers C

    LEFT OUTER JOIN Orders O ON C.CustomerID = O.CustomerID

    AND O.OrderDate >= '19960101'

    AND O.OrderDate < '19970101'

    GROUP BY C.CustomerID, C.CompanyName

    Andy

    I think it also do the job:

    SELECT C.CustomerID, C.CompanyName

    , COUNT(O.OrderID) AS TotalOrders

    FROM Customers C

    LEFT OUTER JOIN Orders O ON C.CustomerID = O.CustomerID AND

    YEAR(O.OrderDate) = 1996

    --AND O.OrderDate >= '19960101'

    --AND O.OrderDate < '19970101'

    GROUP BY C.CustomerID, C.CompanyName

    OR:

    SELECT C.CustomerID, C.CompanyName, COUNT(O.OrderID) AS TotalOrders

    FROM Customers C LEFT OUTER JOIN Orders O ON

    C.CustomerID = O.CustomerID

    WHERE YEAR(O.OrderDate) = 1996

    GROUP BY ALL C.CustomerID, C.CompanyName

  • I would tend to put some of the criteria in the FROM clause in a lot of cases rather than move straight to derived tables. Although in complex situations derived tables can help simplify things. But if the derived table is of particular value and reusable i will often put it in a view anyway.

    pain_killer , the point about NOT using the Year() function is that it will make better use of any index.

  • I'm afraid I'm with Jeff Moden and Tony Rogerson on this.

    I think this is a good example of needing to be careful and understand the implications of what you are saying, and to have covered your research well. There is plenty of info out there on this area from 'good' sources to allow you to draw a conclusion(s) that you can back up (right or wrong, subjectively of course 😉 )

    That said though, I would like to mirror Jeff's comments of not letting the replys dampen your spirit. You have put yourself forward for peer review and from my own experiences its a great way to gain experience and knowledge, and to improve, though it can seem quite painful along the route :crying: .

  • And why not using this sentence:

    SELECT C.CustomerID, C.CompanyName,

    COUNT(O.OrderID) AS TotalOrders

    FROM Customers C LEFT OUTER JOIN Orders O ON

    C.CustomerID = O.CustomerID AND (YEAR(O.OrderDate) = 1996 OR O.OrderDate IS NULL)

    GROUP BY C.CustomerID, C.CompanyName

  • daninmanchester (1/16/2008)


    pain_killer , the point about NOT using the Year() function is that it will make better use of any index.

    Sorry, my english is very poor, and sometimes I don't understand what is the point of the problem, so please be patient to me:). In this case I had compare IO, TIME statistics and execution plans. YEAR() function and explicit date comparing in this case looks identical. So the question is: why use YEAR() function? I think that this looks clearer to me, and it uses only one parameter (2006 in this example).

  • Using columns in functions means SQL Server can no longer do a seek; instead it must do a scan.

    If there was a unique index on CustomerID, orddate.

    Say you had 1 million rows per year between 2000 and 2007 for customerid = 1234

    Say you wanted the counts for customerid 1234 for year 2007

    Using the function Year( orddate ) = 2007 means that all the rows from 2000 for that customerid in the index need to be searched because SQL Server does not know the result until it has read the data and performed the function on the column.

    Using >= '20070101' and < '20080101' allows SQL Server to just seek straight to the start 20070101 and read from there.

  • As others have said, this is generally a good article and the technique is very valuable, but you have many errors. The characterization of the locking in the temp table example is simply wrong. It implies that access to tempdb is single threaded. It fails to mention that there are locks in the database providing the data (shared, but still locks). "Everything happens in memory" only if the data being referenced is already there.

  • Hi

    You could update on driven table but in other way,

    You were trying :

    Update T SET Id=0 FROM (SELECT * FROM tt1) AS T

    but you could use this as follow , will work fine :

    update t set id =0

    from tt1 as T where ...

    You could add you where clause too, you could do that with complex driven tables too.

    Mnaouar Ben Khelifa

    http://www.mnaouar.bestilan.com

  • Hey Prashant...

    You remember I said "don't let it dampen your writing spirit"? Here's why... like I said, you have some good information in your article and people have already cited your article as an example in at least one of the posts... see the following URL for what I'm talking about...

    http://www.sqlservercentral.com/Forums/Topic441707-5-1.aspx#bm443531

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 34 total)

You must be logged in to reply to this topic. Login to reply