Introduction to Common Table Expressions

  • Comments posted to this topic are about the item Introduction to Common Table Expressions

    Aunt Kathi Data Platform MVP
    Author of Expert T-SQL Window Functions
    Simple-Talk Editor

  • CTE's have their place as long as you are not dealing with a large volume of data. One of the problems with CTEs is performance tuning. Unless I'm missing something there is no way to set primary keys or index the CTE. I've used CTE's in applications where I'm dealing with 1k-5k rows of data. Beyond that I'll use a temp table.

    Thanks for the information.

    Kurt

    Kurt W. Zimmerman
    SR DBA
    Lefrak Organization
    New York, NY

    http://www.linkedin.com/in/kurtwzimmerman

  • Actually the CTE uses the table indexes quite nicely, just review the query plan and you will see it pulls the indexes from the table. (Most of the time) 😀

    I do agree however sometimes it is hard to figure out where in a large CTE the performance is slow. Usually the best bet, also the longest time waster is to break apart the query and run each building on each until you find the slow query. I just did that same process for a weighted search and found I was missing a few indexes on some key look up columns that should have been found earlier when I was using the tuning wizard.


    Over 12yrs in IT and 10yrs happily stuck with SQL.
    - SQL 2008/R2/2012/2014/2016/2017
    - Oracle 8/9/10/11
    - MySQL 4/5 and MariaDB

  • Good article. Well articulated.

    I would just like to state that perhaps the following point about CTE usage should have been mentioned as well since that is something which, if a person is not aware of, can lead to unnecessary headaches!!

    ** The query using the CTE must be the first query appearing after the CTE.

    For ex, based on your query in Listing 1, we couldn't do the following:

    WITH emp AS (

    SELECT EmployeeID, FirstName, LastName, E.Title, ManagerID

    FROM HumanResources.Employee AS E

    INNER JOIN Person.Contact AS C ON E.ContactID = C.ContactID

    )

    -- First query after CTE not using the CTE

    SELECT * FROM HumanResources.Employee

    -- Second query after CTE using the CTE will return the "Invalid column name" error message

    SELECT A.EmployeeID, A.FirstName, A.LastName, A.Title,

    A.ManagerID, B.FirstName AS MgrFirstName,

    B.LastName AS MgrLastName, B.Title AS MgrTitle

    FROM emp AS A INNER JOIN emp AS B ON A.ManagerID = B.EmployeeID;

  • Adam Seniuk (12/8/2009)


    Actually the CTE uses the table indexes quite nicely, just review the query plan and you will see it pulls the indexes from the table. (Most of the time) 😀

    I do agree however sometimes it is hard to figure out where in a large CTE the performance is slow. Usually the best bet, also the longest time waster is to break apart the query and run each building on each until you find the slow query. I just did that same process for a weighted search and found I was missing a few indexes on some key look up columns that should have been found earlier when I was using the tuning wizard.

    I had a stored procedure that performed "global search" that queried 3 tables which were rather large in size. I had gone ahead and tuned each separate query. The stored procedure really stressed the system. I simply removed the CTEs and utilized indexed temp tables. This made the performance almost 100% better.

    Just my observations.

    Kurt

    Kurt W. Zimmerman
    SR DBA
    Lefrak Organization
    New York, NY

    http://www.linkedin.com/in/kurtwzimmerman

  • Kurt W. Zimmerman (12/8/2009)


    I had a stored procedure that performed "global search" that queried 3 tables which were rather large in size. I had gone ahead and tuned each separate query. The stored procedure really stressed the system. I simply removed the CTEs and utilized indexed temp tables. This made the performance almost 100% better.

    Just my observations.

    Kurt

    When not used recursively, a CTE is essentially a different syntax for a subquery. They will perform precisely the same as a subquery and generate the same execution plan.

    Now of course a recursive CTE is a different animal performance wise and you can often get better performance with a non-CTE, Non-recursive answer. I touched on this briefly at http://www.sqlservercentral.com/articles/Common+Table+Expression+(CTE)/62404/[/url] and Peter He discussed this in depth at http://www.sqlservercentral.com/articles/T-SQL/2926/[/url]

    Now, there are certainly times that an indexed temp table in a stored procedure can perform better than using subqueries, especially if they are complicated and nested. But in that case it would perform better than any formulation of the subquery, not better than a CTE in particular. As Adam pointed out, the CTE is capable of using the available indexes as well as any other query or subquery.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • Great article. You might want to include some things though. I use a CTE to get me out of awkward JOIN situations. If I want to count the number of Customers assigned to a Worker and print the name from the table [Worker] I have to add every stinking column to the GROUP BY. But with a CTE I can take this preaggregate: SELECT WorkerId, COUNT(CustomerId)

    FROM WorkerCustomer

    GROUP BY UserId

    and the JOIN to it. Since the GROUP BY only applies to the sub-query.

    Also the syntax allows you to specify the column names coming out so that I don't have to put the AS after the COUNT(). It looks like this: WITH UC (UserId, CustomerCount) AS ( ... )

    But then I needed two CTEs in the same query. That syntax is a bit on the awkward side (IMHO). You specify the first CTE then a comma and the second CTE without the WITH. The whole thing is below.

    WITH WC (WorkerId, CustomerCount)

    AS (

    SELECT WorkerId, COUNT(CustomerId)

    FROM WorkerCustomer

    GROUP BY WorkerId

    )

    , WO (WorkerId, OrderCount)

    AS (

    SELECT WorkerId, COUNT(OrderNumber)

    FROM [SalesOrder]

    GROUP BY WorkerId

    )

    SELECT

    [Worker].Name

    ,[Worker],Supervisor

    ,COALESCE(WC.CustomerCount,0) AS [Customer Count]

    ,COALESCE(WO.OrderCount,0) AS [Order Count]

    FROM [Worker]

    LEFT OUTER JOIN WC ON WC.WorkerId = [Worker].ID

    LEFT OUTER JOIN WO ON WO.WorkerId = [Worker].ID

    ORDER BY WC.CustomerCount DESC, WO.OrderCount DESC

    ATBCharles Kincaid

  • Lovely concise, easy to understand introduction. Thank you!

    Nicole Bowman

    Nothing is forever.

  • Aren't most of your examples just syntactic sugar that could be done with a subquery?

    I would prefer to use simpler constructs if possible for ease of future maintenance.

    From what I can see, there are only two situations where a Common Table Expression is really useful:

    1) Where you need to join a subquery to itself, so you would otherwise need to either duplicate the subquery or use a temp table.

    2) Where you want to use recursion.

    Can anyone thing of any others?

  • Nice article.

    I know that this is not a place to ask questions, but some how it is related to CTE execution. So..

    I have a CTE, which contains around 4-5 tables joined within it and then outside CTE, I joined 4 tables to that CTE. (in general, I am joining around 9 tables in total).

    So what is the intermediate steps of execution, in the light of CTE.

    As, we all know FROM will get executed very first and then WHERE, then GROUP BY ...., in the normal SELECT.

    Can someone help me on this.

    Thanks in advance.

    e.g.

    ;WITH CTE

    AS

    (

    SELECT Table1.ID AS ID, Table1.Code AS Code, Table1.Name AS Name, Table5.Description AS Description

    FROM Table1

    INNER JOIN Table2 ON Table1.Sub_ID = Table2.ID

    INNER JOIN Table3 ON Table2.Sub_ID = Table3.ID

    INNER JOIN Table4 ON Table3.Sub_ID = Table4.ID

    INNER JOIN Table5 ON Table4.Sub_ID = Table5.ID

    )

    SELECT .....

    FROM CTE

    INNER JOIN Table6 ON CTE.ID = Table6.ID

    INNER JOIN Table7 ON Table6.Sub_ID = Table7.ID

    INNER JOIN Table8 ON Table7.Sub_ID = Table8.ID

    WHERE <<some condition....>>

  • p456 (12/9/2009)


    Aren't most of your examples just syntactic sugar that could be done with a subquery?

    I would prefer to use simpler constructs if possible for ease of future maintenance.

    From what I can see, there are only two situations where a Common Table Expression is really useful:

    1) Where you need to join a subquery to itself, so you would otherwise need to either duplicate the subquery or use a temp table.

    2) Where you want to use recursion.

    Can anyone thing of any others?

    Well you can JOIN to the same CTE multiple times in the same statement. Like I said it simplifies the GROUP BY thing. Here is the secret, at least to thinking about this. The CTE generates a very temporary table. It only lives for the length of the single statement. Could you do more with actual temp tables? Yes. Could you do as well with table valued functions? Maybe. How about stored procedures that return a table? Give it a shot. Do rhetorical questions bother the heck out of you? Sure they do! Like with everything else in SQL it's there if you want it.

    ATBCharles Kincaid

  • Charles Kincaid (12/9/2009)


    p456 (12/9/2009)


    Aren't most of your examples just syntactic sugar that could be done with a subquery?

    I would prefer to use simpler constructs if possible for ease of future maintenance.

    From what I can see, there are only two situations where a Common Table Expression is really useful:

    1) Where you need to join a subquery to itself, so you would otherwise need to either duplicate the subquery or use a temp table.

    2) Where you want to use recursion.

    Can anyone think of any others?

    Well you can JOIN to the same CTE multiple times in the same statement.

    That's what I mean by point 1.

    Like I said it simplifies the GROUP BY thing.

    Couldn't you just do that with a standard subquery?

    SELECT SalesOrderID, S.CustomerID, CountOfSales, AvgSale, LowestSale, HighestSale

    FROM (

    SELECT COUNT(*) AS CountOfSales, AVG(TotalDue) AS AvgSale,

    MIN(TotalDue) AS LowestSale, MAX(TotalDue) AS HighestSale,

    CustomerID

    FROM Sales.SalesOrderHeader

    GROUP BY CustomerID)

    ) csales

    INNER JOIN Sales.SalesOrderHeader AS S

    ON S.CustomerID = csales.CustomerID;

    Here is the secret, at least to thinking about this. The CTE generates a very temporary table. It only lives for the length of the single statement. Could you do more with actual temp tables? Yes. Could you do as well with table valued functions? Maybe. How about stored procedures that return a table? Give it a shot. Do rhetorical questions bother the heck out of you? Sure they do! Like with everything else in SQL it's there if you want it.

    Yes, but you can also think of a standard subquery as generating a very temporary "table", in exactly the same way.

    I can't see that the CTE gives you anything extra other than the two situations I listed.

  • Charles Kincaid (12/9/2009)


    ...

    Here is the secret, at least to thinking about this. The CTE generates a very temporary table. It only lives for the length of the single statement.

    ...

    Totally wrong! You think that it is building an intermediate result set (temp table) that is being referenced over and over. When in fact it is just taking the entire SQL statement that makes up the CTE and "plugs" it into the main query wherever the CTE is referenced. For proof, just see the query plan. As noted by Tony Rogerson. See: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/05/17/non-recursive-common-table-expressions-performance-sucks-1-cte-self-join-cte-sub-query-inline-expansion.aspx

    So multiple joins to a CTE would be one of the WORST thing you would do.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • Thanks for all the comments.

    It's been a couple of months since I wrote the article, so I don't remember all the examples that I used. I know that there have been times for me that using a temp table has improved performance, but if you write an article saying that, a bunch of people will chime in saying don't use temp tables. I think "it depends" on the situation and one tool doesn't solve every problem.

    At least for me, I have really liked using CTEs because it makes the query easier to read and I have seen performance improvements depending on the situation.

    Aunt Kathi Data Platform MVP
    Author of Expert T-SQL Window Functions
    Simple-Talk Editor

  • Kathi Kellenberger (12/9/2009)


    At least for me, I have really liked using CTEs because it makes the query easier to read and I have seen performance improvements depending on the situation.

    Yes agree. At some place I also found performance improvement.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply