CTE's are useless

Question

CTE's are useless

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1003871 More actions · Answer 1

...if you do it right, one follows the other.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Seggerman-675349 Hall of Fame Points: 3563 More actions · Answer 2

I've used both temp tables, table variables and CTE's in stored procedures, in an environment where the volumn of data is too small to reflect performance differences (most of the time). CTE's have the disadvantege that they need to be used in the statement immediately after they are declared after which they disappear. Temp tables create a result set which can be used again until it is DROPped or the procedure ends.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

CTE's have the disadvantage that they need to be used in the statement immediately after they are declared after which they disappear.

This is how they are designed to be used. If you need the result of a CTE beyond the next statement (where it is used) then you need to either create a view or a temp table. A CTE is a cleaner way of using derived tables. Also, if you need the same derived table multiple times in the same query, the CTE allows you to define it once and use it as a table in the query.

😎

GSquared SSC Guru Points: 260824 More actions · Answer 4

Jeff, what I'm saying is that, while I can get a major speed boost in a database by doing things like allowing read uncommited, or dropping locks and transactions by using table variables, I would definitely avoid that "speed at all cost" methodology.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Mauve SSChampion Points: 11316 More actions · Answer 5

Jeff's explanation is right on. I'm using CTEs more and more in new coding. If anything, for the self-documenting syntax. But with everything in the SQL language, regardless of vendor, one must always evaluate and use "the proper tool for the job".

As for recursive CTEs performing hierarchical tree walks, I still haven't found them to be a winner (in our application). The TVF (Table-Valued Function) that I wrote under SQL Server 2000 (as CTEs didn't exist) still outperforms a CTE (in our application). Note that a key requirement in our application is to order (sort) the intermediate nodes by a user-friendly string (name) value and not the node identifier.

Sadly, I'm still looking for the equivalent to the powerful Oracle CONNECT BY syntax in SQL Server. But I have to work with what I've been dealt.

See my earlier postings on the subject:

http://www.sqlservercentral.com/Forums/FindPost366402.aspx

http://www.sqlservercentral.com/Forums/FindPost443059.aspx

[font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

GSquared SSC Guru Points: 260824 More actions · Answer 6

John, I'd be interested in seeing that function, if you can post it. I've done a lot of tests on hierarchical data, and in my tests, the CTE out-performs TVFs. If you have something better, I'm very interested.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Mauve SSChampion Points: 11316 More actions · Answer 7

John, I'd be interested in seeing that function, if you can post it.

The function (attached) is based upon the MS KB article 248915 http://support.microsoft.com/kb/248915 with some minor tweaks. The "stack" logic is the same stuff that I coded in COBOL back in the 70s for manufacturing BOM against some IBM mainframe database systems.

Some notes:

1. I tried writing an equivalent function (i.e., apples-to-apples) using a CTE against SP1 a year ago and found that it was about 3-5 times slower. So I didn't pursue it further. I just re-tested against SP2, and it is still slower, but a bit improved.

2. We use GUIDs for surrogate keys

3. The example function is ordering on a numeric DISPLAY_ORDER field. However, we also have a similar hierarchy where the ordering of the nodes, and intermediate nodes, will be done on a "name" NVARCHAR(256) column. This is where a CTE would get a bit dicey.

4. We also need, often, but not always, the "row number" of the resulting result set. Thus the IDENTITY column in the return table.

A good test case of this would be to take the output of a DOS "TREE /F" command and load it into a table using GUIDs for the node identifiers. Then be able to return the folders and files of any node sorted by name.

[font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

GSquared SSC Guru Points: 260824 More actions · Answer 8

I modified your function to use Int instead of UniqueID, since my hierarchy table uses that for the node ID. Otherwise, ran it as-is.

Fastest run time was 5434 milliseconds. Average was 5600 milliseconds. The CTE I use has an average of 320 milliseconds and a fastest of under 300, on the exact same data. That's a 7,000 node hierarchy. So, at least on my data, this function is approximately 18 times slower than the CTE.

Try this, on your table:

;WITH Hierarchy (Lvl, Node_ID, Parent_ID, DisplayOrder) as

(SELECT 1, tt.node_id, tt.parent_id, tt.display_order

FROM

test_table tt

WHERE tt.node_id = @v_current_id;

UNION ALL

SELECT Lvl + 1, tt2.node_id, tt2.parent_id, tt2.display_order

FROM

test_table tt2

CROSS JOIN Hierarchy

ON tt2.parent_id = hierarchy.Node_ID)

SELECT Row_Number() OVER (order by DisplayOrder) as [Row_Number],

Node_ID, Parent_ID, DisplayOrder

FROM Hierarchy

ORDER BY DisplayOrder

As far as adding a name field to it, that can be done either in the CTE, or by joining the CTE to a parent table. I do that all the time. If you give me the column name and how it relates to "test_table" (is it part of that table or a join to it), I can add it to this script.

Another structure I've used is adding, at each level, to a varchar(max) column in the CTE, and then using that to force a very specific sequence.

I'm getting the run-time on these by using "set statistics io on" and "set statistics time on". I'm assuming you're doing the same.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Mauve SSChampion Points: 11316 More actions · Answer 9

Nice, but a few things:

1. I measure performance at the caller (application) level based upon elapsed time. For me, it is the only true measure of performance (for the application) regardless of what SQL Server is doing and/or reporting.

2. I believe that the ordering in your solution is wrong. If you look at the example data you will see that the intermediate nodes need to be ordered within their respective parents.

As far as adding a name field to it, that can be done either in the CTE, or by joining the CTE to a parent table.

I realize that I can pick up the name via a join back to the main table or including it as part of the CTE. I think you missed my point. You will see in hierarchical CTE examples, that a desired "sort field" must be constructed by concatenating the field to be sorted as the nodes are traversed. Naturally, all of the examples use simple integers vs. something more common such as UNICODE strings. I noted this in my earlier post to an article.

http://www.sqlservercentral.com/Forums/FindPost455066.aspx

[font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

Mauve SSChampion Points: 11316 More actions · Answer 10

Try this, on your table:

Suggested query will not run due to syntax errors:

1. semicolon after @v_current_id;

2. an interesting error with the CROSS JOIN:

Incorrect syntax near the keyword 'ON'

SQL Server 2005 version is Developer Edition SP2 (9.00.3054.00)

What version of SQL Server 2005 did you run against?

[font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

GSquared SSC Guru Points: 260824 More actions · Answer 11

If all we're comparing is the effectiveness of two SQL commands, measure them on the server. Why would network latency, network traffic load, web server traffic load, etc., matter in this case?

Let's say the network adds 1 second to the time it takes for the command to finish, and the application adds 200 milliseconds, and the load on the web server (assuming this is a web application) or the load on the local PC (assuming this is a local application) add another 500 milliseconds:

Function 1 takes 5 seconds = 5000 + 1000 + 200 + 500 = 6700 = 6.7 seconds

Function 2 takes 300 milliseconds = 300 + 1000 + 200 + 500 = 2000 = 2 seconds

6700 - 2000 = 5000 - 300 = 4700

In other words, all other factors in the speed are constants, so there's no reason to include them in the measure. Go ahead and measure however you like, but the only factor that matters in this particular test is which function processes the fastest on the server.

As far as the sequence goes, it can be added to the CTE like so:

;WITH Hierarchy (Lvl, Node_ID, Parent_ID, DisplayOrder) as

(SELECT 1, tt.node_id, tt.parent_id, cast('0' as varchar(max))

FROM

test_table tt

WHERE tt.node_id = @v_current_id;

UNION ALL

SELECT Lvl + 1, tt2.node_id, tt2.parent_id,

DisplayOrder + '.' + cast(tt2.parent_id as varchar(max))

FROM

test_table tt2

CROSS JOIN Hierarchy

ON tt2.parent_id = hierarchy.Node_ID)

SELECT Row_Number() OVER (order by DisplayOrder) as [Row_Number],

Node_ID, Parent_ID, DisplayOrder

FROM Hierarchy

ORDER BY DisplayOrder

That's what I meant by building a sequence string. Since you're using unique IDs, instead of numbers, that string will get quite long. With integer IDs, I use "right('0000000000' + cast(NodeID as varchar(10)), 10)".

More complex sequencing can be built in the same manner.

With the integers version, you end up with:

0000000000

0000000000.0000000001

0000000000.0000000001.0000000002

0000000000.0000000003

0000000000.0000000003.0000000004

0000000000.0000000003.0000000005

0000000000.0000000006

0000000000.0000000007

For IDs 0-7, assuming ID 0 is the root; 1, 3, 6, and 7 are level 2; 2 is under 1; 4 and 5 are under 3.

When you Order By that field, you can force whatever sequence you want, because you can build the string however you like. Each level adds to the end of the string.

Adding this kind of sequencing string does add a small amount to the processing time. When I run this on my data:

;with Hierarchy (Lvl, ID, PID, Seq, Name) as

(select 1, nodeid, parentid, cast('0000000000' as varchar(max)), name

from dbo.hierarchiesnodes

where nodeid = 267

union all

select lvl + 1, hn2.nodeid, hn2.parentid,

seq + '.' + right('0000000000' + cast(nodeid as varchar(max)), 10),

hn2.name

from dbo.hierarchiesnodes hn2

inner join hierarchy

on hn2.parentid = hierarchy.id)

select *

from hierarchy

order by seq

The run-time goes up to 1.3 seconds, as opposed to .3 seconds. The TVF took over 5 seconds on the exact same data, so it's still approximately 5 times faster.

Will that get you the kind of sequencing you need?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Mauve SSChampion Points: 11316 More actions · Answer 12

Will that get you the kind of sequencing you need?

You correctly build a "sort field" for the "display order" in this case. However, the GUIDs are not the items being sorted when ordering on "name" is desired. The "name" string (UNICODE varying 256) would have to be concatenated. And you cannot concatenate varying length strings to come up with the proper sort field. They would have to be casted as fixed-length strings before concatenation. So it starts to get quite ugly as longer and longer strings are created.

As I stated earlier, an interesting project would be to take the output of a "tree /f" command against your C: drive into a table using GUIDs for the surrogate keys (node IDs). Then be able to pull out any node and its children properly sorted by name using a CTE (or other means).

In the interim, I'm keeping what works (for me). Although I'll revist this "experiment" when time and resources, or the next version of SQL Server, is available.

[font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

Jeff Moden SSC Guru Points: 1003871 More actions · Answer 13

GSquared (4/22/2008)
I would definitely avoid that "speed at all cost" methodology.

Heh... then why have you posted so much on this thread about speed? 😉 The real answer is that speed does matter, always, and if you always consider writing for "speed" and scalability, you will never run into problems. Yes, yes... the code must always be correct and stabile... that's a given...

But, let me ask you this... have you ever used a Tally or Numbers table? A While loop also does the job and produces the correct and stabile answer so why do YOU use a Tally table? Speed. Have you ever intentionally avoided writing a UDF and written inline code instead? Why? Speed. Have you ever written an indexed view? Why? Speed. Have you ever looked at an execution plan? Why? Speed.

You can avoid the "speed at all cost" methodology if you want... but I'd suggest that you haven't and you won't. Considering all that you've written about speed testing on this very thread and the fact that you said "The only time to compromise on performance is when integrity would be threatened by it", I'd have to say your comment about my writing for speed is a bit contrary to what you practice and a bit out of line, as well.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 14

You can avoid the "speed at all cost" methodology if you want... but I'd suggest that you haven't and you won't. Considering all that you've written about speed testing on this very thread and the fact that you said "The only time to compromise on performance is when integrity would be threatened by it", I'd have to say your comment about my writing for speed is a bit contrary to what you practice and a bit out of line, as well.

This thread is about coding for speed, so I imagine that's why he's posted so much on that subject here. He rightly pointed out that speed is important, but not as important as integrity, and I can't see anywhere he's contradicted himself.

John

Jeff Moden SSC Guru Points: 1003871 More actions · Answer 15

John Mitchell (4/23/2008)
You can avoid the "speed at all cost" methodology if you want... but I'd suggest that you haven't and you won't. Considering all that you've written about speed testing on this very thread and the fact that you said "The only time to compromise on performance is when integrity would be threatened by it", I'd have to say your comment about my writing for speed is a bit contrary to what you practice and a bit out of line, as well.
This thread is about coding for speed, so I imagine that's why he's posted so much on that subject here. He rightly pointed out that speed is important, but not as important as integrity, and I can't see anywhere he's contradicted himself.
John

He's not... he's contradicting me. 😉

My point is that, done correctly, high performance code will always have the correct integrity. The two go hand in hand and that's why I took exception to Gus laying out the gaunlet by telling me "I would definitely avoid that "speed at all cost" methodology."

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)