CTE's are useless

Question

CTE's are useless

robinp

SSC Veteran

Points: 277
More actions
April 18, 2008 at 5:24 am

#186806

Firstly, Forgive me.
From my understanding CTE's replace subqueries. But when working with large data sets, is it not more effecient to use temp tables and index those temporary tables? Will this not increase performace?
I guess my question is: Why can you not index a CTE?
I will be interested to hear your arguments.
Rob

Viewing 15 posts - 1 through 15 (of 39 total)

You must be logged in to reply to this topic. Login to reply

ChiragNS One Orange Chip Points: 26137 More actions · Answer 1

Hi

CTE's are maily used for recursive programming although they can be used instead of sub-queries. While its true that they have performance implications they do have some advantages. The thing is we should know when we can use them and we cannot. BTW i think the data sets must be really huge to cause performance drawback.

"Keep Trying"

Anipaul SSC-Insane Points: 24681 More actions · Answer 2

Read this one:

Recursive Queries in SQL Server 2005

By Srinivas Sampath, 2005/03/03

http://www.sqlservercentral.com/articles/Development/recursivequeriesinsqlserver2005/1760/

Andras Belokosztolszki SSC-Insane Points: 21971 More actions · Answer 3

robinp (4/18/2008)
Firstly, Forgive me.
From my understanding CTE's replace subqueries. But when working with large data sets, is it not more effecient to use temp tables and index those temporary tables? Will this not increase performace?
I guess my question is: Why can you not index a CTE?
I will be interested to hear your arguments.
Rob

Note that CTEs are NOT materialized. So if you refer to the same CTE ten times in a query, it will be executed ten times. It is an easy way to set up a "temporary view", and as others have pointed out, is very useful for recursive queries. Temptables can many times perform better, but require you to create the temptable, ... .

Regards,

Andras

Andras Belokosztolszki, MCPD, PhD
GoldenGate Software

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 4

As you probably already know (just making sure), CTE's are nothing more than another way of creating a derived table except you do it up front instead of in the FROM clause. Another way to think of them is an "inline view" and like a standard view (ie. non indexed), the only index they will use is those of the underlying tables.

Like derived tables/inline views, if they're too big, they WILL materialize as a "work" table in TempDB. That's important to know because a lot of DBA's say "NO" to explicit temp table usage without that understanding. Even the small derived tables/inline views make use of TempDB but they "live in memory" because they are small enough to fit. All the same for CTE's.

There are some "sexy" uses of CTE's such as "recursive CTE's"... however, they're performance isn't much better than a "firehose" Cursor (fast forward, read only). Their draw is that they're pretty simple to write and most people think they're written set-based code when they use them. They're actually a major form of RBAR. Still, for really small sets, they do allow for some pretty fast application development. Problem is, people never go back and replace them with set based methods to improve the performance.

So far as I'm concerned, CTE's provide 3 improvements over derived tables/inline views...

1. "Top Down" programming. Instead of having to develop what goes in a FROM clause first, you develop queries at the top and work down from there. If you do it right, many of the CTE's can be executed individually for development and troubleshooting purposes.

2. Readability. Again, if done correctly, they present both easier/better readability and more "room" to document what's going on.

3. Code Consolidation. There's not always the need to reuse a derived table, but when there is, the CTE can simply be used in the same FROM clause with a different alias as can any table or view. In other words, you can easily self join a CTE where you'd have to duplicate code in the FROM clause to do the same with a derived table.

So far as using temp tables goes, I absolutely agree that they're a great "divide and conquer" performance tool and should be allowed to be (properly) used... The "lifetime" of a CTE is the individual query it lives in. If you need to use what is produced by the CTE in more than one query in a proc, then you'd have to add the CTE to the other queries where it's needed. It would recalculate as if in a standalone query. The use of a proper temp table would usurp that particular problem very nicely allowing for much better performance. All the advantages of having a properly indexed table would come into play and performance can improve dramatically (as you've already surmised).

Just a note... we've got some awesome DBA's that understand things like this. Our main "billing" server contains about a tera-byte of data and is growing every day. Since we don't have SQL Server 2005, we don't use CTE's... but we do use the "Divide and Conquer" methods that the correct use of temp tables provide especially on "large/complicated" code especially where more than one instance may need to run at a time. While lot's of DBA's are attempting to keep TempDB as small as possible, ours have made a 12GByte TempDB across 12 "spindles" on boot up. They "get it"... they understand how to allow the developers to beat RBAR... give 'em some "elbow room".

So, just to summarize... for the reasons stated, I don't think CTE's are useless... they're just misunderstood by a lot of folks that use them. To the system, they're really no different than a derived table and an improperly written or recurrsive CTE can do just as much damage to performance as any other improperly written or RBAR code. Except when they're multi-aliased in one query, they offer no performance advantage over derived tables... they're just easier to code (sometimes), easier to read, and easier to document.

Heh... but that just an opinion 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Jones - SSC Editor SSC Guru Points: 734418 More actions · Answer 5

Jeff has a great explanation, and to me, the readability and the simplification of code are reason enough to use them.

GSquared SSC Guru Points: 260824 More actions · Answer 6

I agree with most of what Jeff wrote. The one point I'd like to clarify is that I've tested recursive CTEs in hierarchy (adjacency) resolution, and they are faster than any cursor method I've been able to come up with. Local Fast_Forward goes a long ways towards making a cursor fast, but here's some hard data:

CREATE TABLE [dbo].[HierarchiesNodes](

[HierarchyID] [int] NOT NULL,

[NodeID] [int] IDENTITY(1,1) NOT NULL,

[ParentID] [int] NULL,

[Name] [varchar](100) NOT NULL,

[SetStart] [int] NULL,

[SetEnd] [int] NULL,

[OldID] [int] NULL,

CONSTRAINT [PK_Hierarchy] PRIMARY KEY CLUSTERED

(

[HierarchyID] ASC,

[NodeID] ASC

))

GO

CREATE NONCLUSTERED INDEX [IDX_HierarchiesNodes_Nodes] ON [dbo].[HierarchiesNodes]

(

[NodeID] ASC,

[ParentID] ASC,

[SetStart] ASC,

[SetEnd] ASC

)

INCLUDE ( [Name])

GO

CREATE NONCLUSTERED INDEX [IDX_Hierarchy_Adjacency] ON [dbo].[HierarchiesNodes]

(

[ParentID] ASC

)

INCLUDE ( [Name])

GO

CREATE NONCLUSTERED INDEX [IDX_Hierarchy_SetRange] ON [dbo].[HierarchiesNodes]

(

[SetStart] ASC,

[SetEnd] ASC,

[NodeID] ASC

)

INCLUDE ( [Name])

GO

ALTER TABLE [dbo].[HierarchiesNodes] WITH CHECK ADD FOREIGN KEY([HierarchyID])

REFERENCES [dbo].[Hierarchies] ([HierarchyID])

GO

ALTER TABLE [dbo].[HierarchiesNodes] WITH CHECK ADD CONSTRAINT [FK_Hierarchy]

FOREIGN KEY([HierarchyID], [ParentID])

REFERENCES [dbo].[HierarchiesNodes] ([HierarchyID], [NodeID])

GO

ALTER TABLE [dbo].[HierarchiesNodes] CHECK CONSTRAINT [FK_Hierarchy]

GO

ALTER TABLE [dbo].[HierarchiesNodes] WITH CHECK ADD CONSTRAINT [CK_SetRange]

CHECK (([setend]>=[setstart] OR [setend] IS NULL AND [setstart] IS NULL))

GO

ALTER TABLE [dbo].[HierarchiesNodes] CHECK CONSTRAINT [CK_SetRange]

I added 29,000 rows of data, including 1 hierarchy with 7,000 nodes and 50 levels, and a number of other hierarchies, mostly with between 100 and 300 nodes and 2-5 levels. (This is based of actual data in a production database and I don't have a script for it that I can give you.)

The SetStart and SetEnd columns are for a Nested Sets hierarchy model, while NodeID and ParentID are for an Adjacency hierarchy model. (This hybrid solution allows me to test all kinds of fun stuff in this one.)

HierarchyID is a foreign key that limits the queries and indexes. No child object can be in a different hierarchy than it's parent, which makes for faster queries in this.

Now, to the various means of resolving this based on NodeID and ParentID:

CTE:

;with CTE (Lvl, ID, PID) as

(select 1, NodeID, ParentID

from dbo.HierarchiesNodes

where NodeID = 267

union all

select Lvl + 1, h2.NodeID, ParentID

from dbo.HierarchiesNodes h2

inner join CTE

on h2.ParentID = CTE.ID)

select Lvl, Name

from dbo.HierarchiesNodes

inner join CTE

on NodeID = ID

Results:

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
(7782 row(s) affected)
Table 'HierarchiesNodes'. Scan count 15565, logical reads 31260, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 46694, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 313 ms, elapsed time = 371 ms.

Cursor:

alter proc [dbo].[HierarchyCursor1]

(@NodeID_in int)

as

set nocount on

create table #Temp(

Lvl int,

ID int,

PID int)

insert into #Temp (Lvl, ID, PID)

select 1, NodeID, ParentID

from dbo.HierarchiesNodes

where NodeID = @NodeID_in

exec dbo.HierarchyCursor2 @NodeID_in, 1

select Lvl, Name

from dbo.HierarchiesNodes

inner join #Temp

on NodeID = ID

set nocount off

go

alter proc dbo.HierarchyCursor2

(@NodeID_in int,

@Lvl_in int)

as

declare Crsr cursor local fast_forward for

select NodeID, ParentID

from dbo.HierarchiesNodes

where ParentID = @NodeID_in

open Crsr

declare @ID int, @PID int

fetch next from Crsr into @ID, @PID

select @Lvl_in = @Lvl_in + 1

while @@fetch_status = 0

begin

insert into #Temp (Lvl, ID, PID)

select @Lvl_in, @ID, @PID

exec dbo.HierarchyCursor2 @ID, @Lvl_in

fetch next from Crsr into @ID, @PID

end

close Crsr

deallocate Crsr

(If someone can tell me a more efficient way to build the cursors and procs for this, please do so, if your tests show that it's faster than this.)

Results:

I can't copy and paste the statistics for this one to this board, because it's 295,733 lines of data, but suffice it to say that it's 4751 ms of CPU time, and over 8 seconds total time (as compared to less than a third of a second for the CTE).

A semi-set While statement:

create table #Temp (

Lvl int,

ID int,

PID int)

insert into #temp (lvl, id, pid)

select 1, NodeID, ParentID

from dbo.HierarchiesNodes

where nodeid = 267

declare @Lvl int, @Rows int

select @lvl = 2, @rows = 1

while @rows > 0

begin

insert into #temp (lvl, id, pid)

select @lvl, nodeid, parentid

from dbo.HierarchiesNodes

inner join #temp

on parentid = id

where lvl = @lvl -1

select @lvl = @lvl + 1, @rows = @@rowcount

end

select lvl, name

from dbo.HierarchiesNodes

inner join #temp

on id = nodeid

drop table #temp

(I call this semi-set based, because it does do chunks of data instead one row at a time, but it also involves a loop.)

Results:

Again, can't copy and paste it here because it's too many lines. 393 milliseconds CPU time total.

The While loop and the CTE are so similar in scans, CPU time, etc., that I suspect that the code in the database engine for the CTE is basically a While loop (though, in C/C++ or whatever it is that MS uses for the engine's code). That makes sense in a number of ways, at least to me.

So, populating the CTE and populating a temp table with a While loop are, at least in these version, virtually indistinguishable, but the cursor, even local fast_forware, is much, much worse.

Since a CTE can be used to populate a temp table, and would make for a more readable piece of code, I would use one in the place of the while loop in this case, if you wanted the hierarchy data in a temp table for actual processing.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

GSquared SSC Guru Points: 260824 More actions · Answer 7

And, kind of off-topic, just to show how the nested sets hierarchy performs compared to all three of those:

declare @Start int, @End int

select @start = setstart, @end = setend

from dbo.HierarchiesNodes

where nodeid = 267

select name

from dbo.HierarchiesNodes

where setstart between @start and @end

and setend between @start and @end

Results:

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
Table 'HierarchiesNodes'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(7782 row(s) affected)
Table 'HierarchiesNodes'. Scan count 1, logical reads 41, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 170 ms.

Note that I haven't figured out a way to get the level data to return in the nested sets version, but if what I'm doing is joining the hierarchy to other data, it is more efficient proportional to the number of nodes in the hierarchy, and level could be hard-coded into the underlying table by the same code that updates the nested sets.

Edit: I just figured out how to do the levels. Hard-code the absolute level into the table, and use Dense_Rank() to return the relative level in the query. Simple, fast and easy.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 8

The thing about using temp tables is that it will materialise the whole of the "view". With a CTE, it will only take out what you need.

Consider the following

SET STATISTICS IO ON

SELECT * FROM MyLargeTable WHERE TableID = 1

Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

WITH CTE AS (SELECT * FROM MyLargeTable)

SELECT * FROM CTE WHERE TableID = 1

Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SELECT * INTO #temp FROM MyLargeTable

SELECT * FROM #temp WHERE TableID = 1

Scan count 1, logical reads 76988, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

This suggests to me that the whole of the CTE wasn't materialised into memory or tempdb, but rather the query optimizer detected that it only needed one row and ran the first two queries in the same way. So I would say that if you're only using the results of your derived table once, and you'll only be using a small subset of it, it's better to use a CTE than a temp table.

John

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 9

John Mitchell (4/18/2008)
The thing about using temp tables is that it will materialise the whole of the "view". With a CTE, it will only take out what you need.
Consider the following
SET STATISTICS IO ON
SELECT * FROM MyLargeTable WHERE TableID = 1
Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
WITH CTE AS (SELECT * FROM MyLargeTable)
SELECT * FROM CTE WHERE TableID = 1
Scan count 0, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SELECT * INTO #temp FROM MyLargeTable
SELECT * FROM #temp WHERE TableID = 1
Scan count 1, logical reads 76988, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
This suggests to me that the whole of the CTE wasn't materialised into memory or tempdb, but rather the query optimizer detected that it only needed one row and ran the first two queries in the same way. So I would say that if you're only using the results of your derived table once, and you'll only be using a small subset of it, it's better to use a CTE than a temp table.
John

I'd think it's because you're using an example that doesn't require materialization. It goes back to treating it like an "inline, unindexed view", which means that at execution time, the TEXT of the CTE will be placed (a la find and replace) into the outer query in place of the name of the CTE, and then this new query is then parsed, optimized and compiled on its own.

If the outer query's exec plan thinks that it makes sense to materialize that part of its query (if for example the table contains aggregation, or a Top/distinct predicate), then a work table is created (in which case you might have done better by creating the temp table yourself).

It's a big "it depends" - if your derived table statement returns 10M rows being matched up to 100M rows, and the indexes aren't great to handle how it's linking out, then (CTE or not) it can be advantageous to "manually materialize" the thing through a temp table, and index it yourself.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 10

Gus... nice testing!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 11

Matt Miller (4/18/2008)
I'd think it's because you're using an example that doesn't require materialization.

I agree... even with a CTE, I'll try to return as few rows and columns as possible and still get the job done. With that in mind, if the CTE needs to be used more than once, the temp table is the way to go for me... even if it's only going to be used once, I'll still occasionally use a temp table instead of a CTE just to get the extra speed out of an index.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Johan Bijnens SSC Guru Points: 135255 More actions · Answer 12

Jeff Moden (4/18/2008)
Matt Miller (4/18/2008)
I'd think it's because you're using an example that doesn't require materialization.
I agree... even with a CTE, I'll try to return as few rows and columns as possible and still get the job done. With that in mind, if the CTE needs to be used more than once, the temp table is the way to go for me... even if it's only going to be used once, I'll still occasionally use a temp table instead of a CTE just to get the extra speed out of an index.

Indeed, it's performance we strive for.

There are guidlines, but if considered well and documented well, exceptions are allowed.

Rule 1 : tell the system what you know is till applicable.

Rule 2 : work set based.

Rule 3 : only select the data and columns you strictly need.

hey, maybe we could compile a top 100 of this list :w00t:

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 13

Heh... no... I always try to use the method with the best performance instead of documenting why performance can suffer... it's part of my 3 rules of a DBA.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

GSquared SSC Guru Points: 260824 More actions · Answer 14

The only time to compromise on performance is when integrity would be threatened by it. Integrity first, performance second.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon