Common Table Expressions

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 714668

    I think the biggest benefit for many people is that CTEs allow them to write complex SQL in an easier way. Moving to derived tables just seems harder for many people writing queries.

    A CTE can be written first, meaning write part of your SQL that you need, then once you enclose it in the CTE formatting, you add it like any other table or view, without having to create those objects.

    However as someone that's been writing T-SQL for over a dozen years, I'm not sure it's that much easier for me. It is slightly clearer in some cases, but for the most part I find relatively few queries where I'd use it. I think that's an experience thing

  • TimothyAWiseman

    SSCrazy Eights

    Points: 8819

    This was a simple, clear, and well written article. The subject matter is a bit basic for someone who is familiar with the new features in 2005, but it is an excellent introduction to CTEs.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • srienstr

    SSCrazy

    Points: 2230

    69.5% of the total batch cost for the derived table method according to the execution plan, but only 31.8% of the runtime (according to the getdate() queries I inserted at various steps)... I'm confused. I'd assumed that cost and runtime were directly correlated unless there were competing processes, but it seems to occur consistently, regardless of which query is first in the batch.

    Edit: There is a competing load going on, which is probably what threw off how the cost compares to the execution time due to the competition for CPU, RAM and I/O.


    Puto me cogitare, ergo puto me esse.
    I think that I think, therefore I think that I am.

  • jdoconsulting

    Grasshopper

    Points: 11

    Does anyone have a good example of using a CTE to rollup assembly costs in a nested Bill of Materials? It seems to me that this should be done from the bottom of the assembly upwards. Most examples I've found deal with rolling up salaries for employees in an organization. The difference with a BOM is that an assembly can be part of multiple assemblies so there isn't just one path in the hierarchy like there is with an employee in an organization.

  • GSquared

    SSC Guru

    Points: 260824

    jdoconsulting (3/18/2008)


    Does anyone have a good example of using a CTE to rollup assembly costs in a nested Bill of Materials? It seems to me that this should be done from the bottom of the assembly upwards. Most examples I've found deal with rolling up salaries for employees in an organization. The difference with a BOM is that an assembly can be part of multiple assemblies so there isn't just one path in the hierarchy like there is with an employee in an organization.

    Not sure how you would do that. If you have 2-inch #10 bolts as one of you assembly parts, how do you roll that up? It might be used in hundreds of different end products.

    On BoM hierarchies, I usually roll them down. Start at the final product, or the sub-assembly that you want data for, and go down from there.

    If you really want to go from the bottom up, you'd just reverse the join from the Books Online example. Instead of joining the Parent ID in the table to the ID in the CTE, join the ID in the table to the Parent ID in the CTE. That gives you a bottom-up hierarchy.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared

    SSC Guru

    Points: 260824

    srienstr (3/18/2008)


    69.5% of the total batch cost for the derived table method according to the execution plan, but only 31.8% of the runtime (according to the getdate() queries I inserted at various steps)... I'm confused. I'd assumed that cost and runtime were directly correlated unless there were competing processes, but it seems to occur consistently, regardless of which query is first in the batch.

    Edit: There is a competing load going on, which is probably what threw off how the cost compares to the execution time due to the competition for CPU, RAM and I/O.

    Estimated Cost does not neccessarily equal Percentage of Runtime

    It sometimes does, and sometimes is close, but not always. Cost isn't just an estimate of how long something will take. It's also an estimate of how much CPU time it will take, how many I/O cycles, how many mathematical computations, how many string functions, etc. It's meant to be a numeric representation of how much effort the server is going through to accomplish something. If it were meant to correlate directly to time, it would be an estimated execution time field, not an estimated cost.

    As an aside, you're better of using "set statistics time on", than adding "select getdate()" clauses to your query. It's more accurate since the select getdate() query can, itself, add time to the runtime of the query.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Jeff Moden

    SSC Guru

    Points: 993897

    And then you better make sure you run the code more than once, even on a quiet system...

    DECLARE @Year INT

    SET @Year = 2008

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    ;WITH cteDates AS

    (

    SELECT DATEADD(yy,@Year-1900,0)+Number AS TheDate

    FROM Master.dbo.spt_Values

    WHERE Type = 'P'

    AND Number <= 366

    )

    SELECT cd.TheDate, DATEPART(qq,TheDate) AS TheQuarter

    FROM cteDates cd

    WHERE YEAR(TheDate) = @Year

    SET STATISTICS TIME OFF

    PRINT REPLICATE('=',100)

    SET STATISTICS TIME ON

    SELECT DATEADD(yy,@Year-1900,0)+Number AS TheDate,

    DATEPART(qq,DATEADD(yy,@Year-1900,0)+Number) AS TheQuarter

    FROM Master.dbo.spt_Values

    WHERE Type = 'P'

    AND Number < 366

    AND YEAR(DATEADD(yy,@Year-1900,0)+Number) = @Year

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • srienstr

    SSCrazy

    Points: 2230

    Jeff Moden (3/18/2008)


    And then you better make sure you run the code more than once, even on a quiet system...

    I did three runs in one order, and two runs with the order switched, the % cost matched, though the % time did vary slightly. The issue that threw me off was that an I/O intensive load was running at the same time, so the I/O portion of the costs was actually taking much more time relative to the CPU and RAM portions.

    Then again, it may just be that the cost formula is not properly balanced for our server for some reason that isn't apparent to me.


    Puto me cogitare, ergo puto me esse.
    I think that I think, therefore I think that I am.

  • Jeff Moden

    SSC Guru

    Points: 993897

    Did you look at the "Message" window? Sometimes one or the other runs in half the time as the other. Some folks look at something like that on just one run and try to take it to the bank thinking (like writing company SQL standards) that one or the other is better... bad mistake a lot of times.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • GSquared

    SSC Guru

    Points: 260824

    A script I posted on another thread in this forum illustrates why I like the layout and readability of CTEs.

    The thread is at http://www.sqlservercentral.com/Forums/Topic470379-338-1.aspx

    The script is:

    ;with

    CTE1 (ID, Position, Val) as

    (select id, numbers.number, substring(reverse(rtrim(clean)), numbers.number, 1)

    from dbo.numbercleanb

    inner join dbo.numbers

    on numbers.number between 1 and len(clean)),

    CTE2 (ID, Pos, Val) as

    (select id, row_number() over (partition by id order by position), val

    from cte1

    where val like '[0-9]'),

    CTE3 (ID, Number) as

    (select id, sum(power(cast(10 as bigint), cast(pos-1 as bigint)) * cast(val as bigint))

    from cte2

    group by id)

    select count(*)

    from cte3

    inner join dbo.numbercleanb

    on cte3.id = numbercleanb.id

    and cte3.number != numbercleanb.number

    A "derived tables" version might look like:

    select count(*)

    from

    (select id,

    sum(power(cast(10 as bigint), cast(pos-1 as bigint)) * cast(val as bigint)) as number

    from

    (select id, row_number() over (partition by id order by position) as pos, val

    from

    (select id, numbers.number as position,

    substring(reverse(rtrim(clean)), numbers.number, 1) as val

    from dbo.numbercleanb

    inner join dbo.numbers

    on numbers.number between 1 and len(clean)) Sub1

    where val like '[0-9]') Sub2

    group by id) Sub3

    inner join dbo.numbercleanb

    on sub3.id = numbercleanb.id

    and sub3.number != numbercleanb.number

    The reasons I find the CTEs more readable are:

    A) I don't have to keep increasing the indention as I increase the number of levels of subquery. This means I'm less likely to have to scroll left and right on the screen.

    B) The column names in the derived tables are less visible than those in the CTEs.

    C) If I need to debug or modify part of the query, the CTEs are self-contained and can be modified more modularly.

    D) Each query, including the final/outer query, is all in one place. In the derived tables version, part of each subquery is above and part below, the inner subs. This is forced because I have to Select, From, Where for each subquery, and From comes before Where. In the CTEs, this isn't neccessary and it keeps each in one place.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Lynn Pettis

    SSC Guru

    Points: 442116

    GSquared,

    Good example of how CTE's can help make some code easier to read and/or maintain.

    😎

  • Julie Breutzmann

    Hall of Fame

    Points: 3587

    Thanks, Kirk for a well-written explanation of CTEs.

    Great discussion, too. Thanks!

Viewing 12 posts - 31 through 42 (of 42 total)

You must be logged in to reply to this topic. Login to reply