CTE's are useless

  • ...if you do it right, one follows the other.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I've used both temp tables, table variables and CTE's in stored procedures, in an environment where the volumn of data is too small to reflect performance differences (most of the time). CTE's have the disadvantege that they need to be used in the statement immediately after they are declared after which they disappear. Temp tables create a result set which can be used again until it is DROPped or the procedure ends.

  • CTE's have the disadvantage that they need to be used in the statement immediately after they are declared after which they disappear.

    This is how they are designed to be used. If you need the result of a CTE beyond the next statement (where it is used) then you need to either create a view or a temp table. A CTE is a cleaner way of using derived tables. Also, if you need the same derived table multiple times in the same query, the CTE allows you to define it once and use it as a table in the query.

    😎

  • Jeff, what I'm saying is that, while I can get a major speed boost in a database by doing things like allowing read uncommited, or dropping locks and transactions by using table variables, I would definitely avoid that "speed at all cost" methodology.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Jeff's explanation is right on. I'm using CTEs more and more in new coding. If anything, for the self-documenting syntax. But with everything in the SQL language, regardless of vendor, one must always evaluate and use "the proper tool for the job".

    As for recursive CTEs performing hierarchical tree walks, I still haven't found them to be a winner (in our application). The TVF (Table-Valued Function) that I wrote under SQL Server 2000 (as CTEs didn't exist) still outperforms a CTE (in our application). Note that a key requirement in our application is to order (sort) the intermediate nodes by a user-friendly string (name) value and not the node identifier.

    Sadly, I'm still looking for the equivalent to the powerful Oracle CONNECT BY syntax in SQL Server. But I have to work with what I've been dealt.

    See my earlier postings on the subject:

    http://www.sqlservercentral.com/Forums/FindPost366402.aspx

    http://www.sqlservercentral.com/Forums/FindPost443059.aspx


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • John, I'd be interested in seeing that function, if you can post it. I've done a lot of tests on hierarchical data, and in my tests, the CTE out-performs TVFs. If you have something better, I'm very interested.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • John, I'd be interested in seeing that function, if you can post it.

    The function (attached) is based upon the MS KB article 248915 http://support.microsoft.com/kb/248915 with some minor tweaks. The "stack" logic is the same stuff that I coded in COBOL back in the 70s for manufacturing BOM against some IBM mainframe database systems.

    Some notes:

    1. I tried writing an equivalent function (i.e., apples-to-apples) using a CTE against SP1 a year ago and found that it was about 3-5 times slower. So I didn't pursue it further. I just re-tested against SP2, and it is still slower, but a bit improved.

    2. We use GUIDs for surrogate keys

    3. The example function is ordering on a numeric DISPLAY_ORDER field. However, we also have a similar hierarchy where the ordering of the nodes, and intermediate nodes, will be done on a "name" NVARCHAR(256) column. This is where a CTE would get a bit dicey.

    4. We also need, often, but not always, the "row number" of the resulting result set. Thus the IDENTITY column in the return table.

    A good test case of this would be to take the output of a DOS "TREE /F" command and load it into a table using GUIDs for the node identifiers. Then be able to return the folders and files of any node sorted by name.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • I modified your function to use Int instead of UniqueID, since my hierarchy table uses that for the node ID. Otherwise, ran it as-is.

    Fastest run time was 5434 milliseconds. Average was 5600 milliseconds. The CTE I use has an average of 320 milliseconds and a fastest of under 300, on the exact same data. That's a 7,000 node hierarchy. So, at least on my data, this function is approximately 18 times slower than the CTE.

    Try this, on your table:

    ;WITH Hierarchy (Lvl, Node_ID, Parent_ID, DisplayOrder) as

    (SELECT 1, tt.node_id, tt.parent_id, tt.display_order

    FROM

    test_table tt

    WHERE tt.node_id = @v_current_id;

    UNION ALL

    SELECT Lvl + 1, tt2.node_id, tt2.parent_id, tt2.display_order

    FROM

    test_table tt2

    CROSS JOIN Hierarchy

    ON tt2.parent_id = hierarchy.Node_ID)

    SELECT Row_Number() OVER (order by DisplayOrder) as [Row_Number],

    Node_ID, Parent_ID, DisplayOrder

    FROM Hierarchy

    ORDER BY DisplayOrder

    As far as adding a name field to it, that can be done either in the CTE, or by joining the CTE to a parent table. I do that all the time. If you give me the column name and how it relates to "test_table" (is it part of that table or a join to it), I can add it to this script.

    Another structure I've used is adding, at each level, to a varchar(max) column in the CTE, and then using that to force a very specific sequence.

    I'm getting the run-time on these by using "set statistics io on" and "set statistics time on". I'm assuming you're doing the same.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Nice, but a few things:

    1. I measure performance at the caller (application) level based upon elapsed time. For me, it is the only true measure of performance (for the application) regardless of what SQL Server is doing and/or reporting.

    2. I believe that the ordering in your solution is wrong. If you look at the example data you will see that the intermediate nodes need to be ordered within their respective parents.

    As far as adding a name field to it, that can be done either in the CTE, or by joining the CTE to a parent table.

    I realize that I can pick up the name via a join back to the main table or including it as part of the CTE. I think you missed my point. You will see in hierarchical CTE examples, that a desired "sort field" must be constructed by concatenating the field to be sorted as the nodes are traversed. Naturally, all of the examples use simple integers vs. something more common such as UNICODE strings. I noted this in my earlier post to an article.

    http://www.sqlservercentral.com/Forums/FindPost455066.aspx


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • Try this, on your table:

    Suggested query will not run due to syntax errors:

    1. semicolon after @v_current_id;

    2. an interesting error with the CROSS JOIN:

    Incorrect syntax near the keyword 'ON'

    SQL Server 2005 version is Developer Edition SP2 (9.00.3054.00)

    What version of SQL Server 2005 did you run against?


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • If all we're comparing is the effectiveness of two SQL commands, measure them on the server. Why would network latency, network traffic load, web server traffic load, etc., matter in this case?

    Let's say the network adds 1 second to the time it takes for the command to finish, and the application adds 200 milliseconds, and the load on the web server (assuming this is a web application) or the load on the local PC (assuming this is a local application) add another 500 milliseconds:

    Function 1 takes 5 seconds = 5000 + 1000 + 200 + 500 = 6700 = 6.7 seconds

    Function 2 takes 300 milliseconds = 300 + 1000 + 200 + 500 = 2000 = 2 seconds

    6700 - 2000 = 5000 - 300 = 4700

    In other words, all other factors in the speed are constants, so there's no reason to include them in the measure. Go ahead and measure however you like, but the only factor that matters in this particular test is which function processes the fastest on the server.

    As far as the sequence goes, it can be added to the CTE like so:

    ;WITH Hierarchy (Lvl, Node_ID, Parent_ID, DisplayOrder) as

    (SELECT 1, tt.node_id, tt.parent_id, cast('0' as varchar(max))

    FROM

    test_table tt

    WHERE tt.node_id = @v_current_id;

    UNION ALL

    SELECT Lvl + 1, tt2.node_id, tt2.parent_id,

    DisplayOrder + '.' + cast(tt2.parent_id as varchar(max))

    FROM

    test_table tt2

    CROSS JOIN Hierarchy

    ON tt2.parent_id = hierarchy.Node_ID)

    SELECT Row_Number() OVER (order by DisplayOrder) as [Row_Number],

    Node_ID, Parent_ID, DisplayOrder

    FROM Hierarchy

    ORDER BY DisplayOrder

    That's what I meant by building a sequence string. Since you're using unique IDs, instead of numbers, that string will get quite long. With integer IDs, I use "right('0000000000' + cast(NodeID as varchar(10)), 10)".

    More complex sequencing can be built in the same manner.

    With the integers version, you end up with:

    0000000000

    0000000000.0000000001

    0000000000.0000000001.0000000002

    0000000000.0000000003

    0000000000.0000000003.0000000004

    0000000000.0000000003.0000000005

    0000000000.0000000006

    0000000000.0000000007

    For IDs 0-7, assuming ID 0 is the root; 1, 3, 6, and 7 are level 2; 2 is under 1; 4 and 5 are under 3.

    When you Order By that field, you can force whatever sequence you want, because you can build the string however you like. Each level adds to the end of the string.

    Adding this kind of sequencing string does add a small amount to the processing time. When I run this on my data:

    ;with Hierarchy (Lvl, ID, PID, Seq, Name) as

    (select 1, nodeid, parentid, cast('0000000000' as varchar(max)), name

    from dbo.hierarchiesnodes

    where nodeid = 267

    union all

    select lvl + 1, hn2.nodeid, hn2.parentid,

    seq + '.' + right('0000000000' + cast(nodeid as varchar(max)), 10),

    hn2.name

    from dbo.hierarchiesnodes hn2

    inner join hierarchy

    on hn2.parentid = hierarchy.id)

    select *

    from hierarchy

    order by seq

    The run-time goes up to 1.3 seconds, as opposed to .3 seconds. The TVF took over 5 seconds on the exact same data, so it's still approximately 5 times faster.

    Will that get you the kind of sequencing you need?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Will that get you the kind of sequencing you need?

    You correctly build a "sort field" for the "display order" in this case. However, the GUIDs are not the items being sorted when ordering on "name" is desired. The "name" string (UNICODE varying 256) would have to be concatenated. And you cannot concatenate varying length strings to come up with the proper sort field. They would have to be casted as fixed-length strings before concatenation. So it starts to get quite ugly as longer and longer strings are created.

    As I stated earlier, an interesting project would be to take the output of a "tree /f" command against your C: drive into a table using GUIDs for the surrogate keys (node IDs). Then be able to pull out any node and its children properly sorted by name using a CTE (or other means).

    In the interim, I'm keeping what works (for me). Although I'll revist this "experiment" when time and resources, or the next version of SQL Server, is available.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • GSquared (4/22/2008)


    I would definitely avoid that "speed at all cost" methodology.

    Heh... then why have you posted so much on this thread about speed? 😉 The real answer is that speed does matter, always, and if you always consider writing for "speed" and scalability, you will never run into problems. Yes, yes... the code must always be correct and stabile... that's a given...

    But, let me ask you this... have you ever used a Tally or Numbers table? A While loop also does the job and produces the correct and stabile answer so why do YOU use a Tally table? Speed. Have you ever intentionally avoided writing a UDF and written inline code instead? Why? Speed. Have you ever written an indexed view? Why? Speed. Have you ever looked at an execution plan? Why? Speed.

    You can avoid the "speed at all cost" methodology if you want... but I'd suggest that you haven't and you won't. Considering all that you've written about speed testing on this very thread and the fact that you said "The only time to compromise on performance is when integrity would be threatened by it", I'd have to say your comment about my writing for speed is a bit contrary to what you practice and a bit out of line, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You can avoid the "speed at all cost" methodology if you want... but I'd suggest that you haven't and you won't. Considering all that you've written about speed testing on this very thread and the fact that you said "The only time to compromise on performance is when integrity would be threatened by it", I'd have to say your comment about my writing for speed is a bit contrary to what you practice and a bit out of line, as well.

    This thread is about coding for speed, so I imagine that's why he's posted so much on that subject here. He rightly pointed out that speed is important, but not as important as integrity, and I can't see anywhere he's contradicted himself.

    John

  • John Mitchell (4/23/2008)


    You can avoid the "speed at all cost" methodology if you want... but I'd suggest that you haven't and you won't. Considering all that you've written about speed testing on this very thread and the fact that you said "The only time to compromise on performance is when integrity would be threatened by it", I'd have to say your comment about my writing for speed is a bit contrary to what you practice and a bit out of line, as well.

    This thread is about coding for speed, so I imagine that's why he's posted so much on that subject here. He rightly pointed out that speed is important, but not as important as integrity, and I can't see anywhere he's contradicted himself.

    John

    He's not... he's contradicting me. 😉

    My point is that, done correctly, high performance code will always have the correct integrity. The two go hand in hand and that's why I took exception to Gus laying out the gaunlet by telling me "I would definitely avoid that "speed at all cost" methodology."

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply