Common Table Expressions

  • Comments posted to this topic are about the item Common Table Expressions

  • A very simple and good article. Well explained and very constructive article.

  • As you can see CTEs can be very useful. I have found them particularly handy when cleaning up demoralized tables

    What are "demoralized tables"? I didn't know they can have such character. 😉

  • I see the value of CTEs, but I'm not sure I agree with the example... wouldn't it be better written (more efficient?) using a single aggregate query and a TOP clause?

    SELECT TOP 1 dept_id, n

    FROM (

    SELECT dept_id, count(*) AS n

    FROM employee

    GROUP BY dept_id

    ) AS a

    ORDER BY n DESC

    I guess the question for me is: while CTEs provide a nice syntax for repeated and recursive expressions, is the SQL optimizer actually able to use them as efficiently as a statement designed to avoid repetitive expressions in the first place?

    I tried this on a very simple dataset (comparing all three statements), and found that the total cost of the queries, in all three cases, was the same. The query plans were slightly more complicated for the original statement and CTE statement, and slightly simpler for the statement above (single Sort/TopN Sort instead of Sort -> Filter -> Top) - unfortunately I don't have a large dataset to test on!

    My instinct would be: Try to rewrite your statement to use joins and derived tables rather than subqueries, before moving to CTEs as a way of simplifying your subqueries.

    Does anyone know better one way or the other?

    http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.

  • There was no explanation about why the CTE was a better approach than any of the others

  • The query provided by Tao will cost less in a large database then CTE no doubt........

  • was there any source code or at least a method to create the database and table(s) referenced?

    It would be helpful in testing running the provided code.

    Good article otherwise

    I also liked the demoralized term.

  • Doing my morning article "skimming" and saw this one. In the "Conclusion" it's stated "This way I avoid using a GROUP BY "... and the example right above it has two CTE's... and each has a GROUP BY in it. I admit that's it's just a "skim" so far (I'll read it in depth tonight), but just exactly what do you mean by you avoided the use of GROUP BY?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The query provided by Tao is fine and was the first solution i thought of when i read the example problem. However, the CTE example will effortlessly resolve the issue of two or more departments having the same number of employees:

    If two departments have n employees and are tied for having the most in the company, both should be displayed. Tao's use of the TOP 1 clause in this case will choose one of the departments arbitrarily for display and produce a less-than desirable result.

    Still, i'm not all that happy with the example. The issue can be solved many other other ways besides using CTE's though it does present the CTE concept very cleanly.

  • Tao Klerks (3/18/2008)


    ...My instinct would be: Try to rewrite your statement to use joins and derived tables rather than subqueries, before moving to CTEs as a way of simplifying your subqueries.

    Does anyone know better one way or the other?

    CTEs work pretty much the same way as derived tables. They have the advantage of "build once, reference many", but in cases where you just use the derived table once, a CTE and a derived table are essentially the same, and will resolve the same way in the query engine.

    In those cases, I currently use CTEs instead of derived tables, because I find they make for more readable code. The final select statement isn't as cluttered. No performance reason, just easier to read.

    Also, since I am using CTEs in the places where they have significant advantages over derived tables (self-reference, multi-reference, query-of-query), using them in other places is more consistent, and that helps me to set and maintain a standard.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Tao Klerks (3/18/2008)


    SELECT TOP 1 dept_id, n

    FROM (

    SELECT dept_id, count(*) AS n

    FROM employee

    GROUP BY dept_id

    ) AS a

    ORDER BY n DESC

    This is essentially the same as:

    ;with Depts (ID, Employees) as

    (select dept_id, count(*)

    from dbo.employee

    group by dept_id)

    select top 1 ID

    from depts

    order by employees desc

    Either one is better than what was presented in the article. The two examples above will have the same query plan (in my tests and use).

    It's not a question of performance of derived tables vs performance of CTEs. Both work the same way.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • paulsasik (3/18/2008)


    The query provided by Tao is fine and was the first solution i thought of when i read the example problem. However, the CTE example will effortlessly resolve the issue of two or more departments having the same number of employees:

    If two departments have n employees and are tied for having the most in the company, both should be displayed. Tao's use of the TOP 1 clause in this case will choose one of the departments arbitrarily for display and produce a less-than desirable result. ...

    Easy enough to add a "With Ties" to "Select Top".

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • hi, hi don't see the advantage of CTE to temp Tables like in the example below. Is there any advantage?

    WITH

    CountEmployees(dept_id, n) AS

    ( SELECT dept_id, count(*) AS n

    FROM employee GROUP BY dept_id ),

    DeptPayroll( dept_id, amt ) AS

    ( SELECT dept_id, sum(salary) AS amt

    FROM employee GROUP BY dept_id )

    select v.* into #CountEmployees from

    (SELECT dept_id, count(*) AS n

    FROM employee GROUP BY dept_id)v

    select v.* into #DeptPayroll from

    (SELECT dept_id, sum(salary) AS amt

    FROM employee GROUP BY dept_id)v

  • Thanks GSquared - this makes sense to me.

    I don't think I agree about the clarity of using CTEs for derived tables, but I guess that might be because I still live in a SQL Server 2000 world (with very strict formatting standards).

    I think I'd rather have:

    SELECT Column1

    FROM (

    SELECT Column1

    FROM (

    SELECT Column1, Column2

    FROM SomeTable

    WHERE Column2 = 3

    ) AS Derived1

    WHERE Column1 = 2

    ) AS Derived2

    WHERE 1 = 1

    than

    WITH Derived1 AS (

    SELECT Column1, Column2

    FROM SomeTable

    WHERE Column2 = 4

    ),

    Derived2 AS (

    SELECT Column1

    FROM Derived1

    WHERE Column1 = 3

    )

    SELECT Column1

    FROM Derived2

    WHERE 1 = 1

    but if I understand correctly that is really a matter of taste / formatting rather than any logical difference (and I just checked the query plan for good measure :))

    http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.

  • To me, CTE's have the advantage of Derived Tables because you can reference them many times in the same query at different levels in the query whereas a Derived table can only be referenced one level up in the same query. It's not a huge advantage until you actually need it. CTE's also (to me, anyway) present themselves nicer for readability and documentation purposes. They can also call themselves (recursion) like a function might.

    Temporary tables have the advantage over both because they persist for longer than just one query. I used (and still use) temp tables in place of CTEs ("sub-query refactoring" in Oracle) long before they were available in MSSQL. In fact, if I know a result set must be used across many queries in a sproc, I'll still use TempTables instead of CTE's. I like "local" temp tables a lot... I wish Oracle understood the concept as well. Their "global" temp tables don't work out quite so well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 41 total)

You must be logged in to reply to this topic. Login to reply