Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Common Table Expressions Expand / Collapse
Author
Message
Posted Tuesday, March 18, 2008 7:37 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Thursday, December 18, 2014 9:58 AM
Points: 13,872, Visits: 9,600
Tao Klerks (3/18/2008)
SELECT TOP 1 dept_id, n
FROM (
SELECT dept_id, count(*) AS n
FROM employee
GROUP BY dept_id
) AS a
ORDER BY n DESC



This is essentially the same as:

;with Depts (ID, Employees) as
(select dept_id, count(*)
from dbo.employee
group by dept_id)

select top 1 ID
from depts
order by employees desc

Either one is better than what was presented in the article. The two examples above will have the same query plan (in my tests and use).

It's not a question of performance of derived tables vs performance of CTEs. Both work the same way.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #470805
Posted Tuesday, March 18, 2008 7:41 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Thursday, December 18, 2014 9:58 AM
Points: 13,872, Visits: 9,600
paulsasik (3/18/2008)
The query provided by Tao is fine and was the first solution i thought of when i read the example problem. However, the CTE example will effortlessly resolve the issue of two or more departments having the same number of employees:

If two departments have n employees and are tied for having the most in the company, both should be displayed. Tao's use of the TOP 1 clause in this case will choose one of the departments arbitrarily for display and produce a less-than desirable result. ...


Easy enough to add a "With Ties" to "Select Top".


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #470810
Posted Tuesday, March 18, 2008 7:51 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, February 12, 2009 8:14 AM
Points: 1, Visits: 5
hi, hi don't see the advantage of CTE to temp Tables like in the example below. Is there any advantage?


WITH
CountEmployees(dept_id, n) AS
( SELECT dept_id, count(*) AS n
FROM employee GROUP BY dept_id ),
DeptPayroll( dept_id, amt ) AS
( SELECT dept_id, sum(salary) AS amt
FROM employee GROUP BY dept_id )

select v.* into #CountEmployees from
(SELECT dept_id, count(*) AS n
FROM employee GROUP BY dept_id)v
select v.* into #DeptPayroll from
(SELECT dept_id, sum(salary) AS amt
FROM employee GROUP BY dept_id)v


Post #470823
Posted Tuesday, March 18, 2008 8:18 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, September 25, 2014 12:38 PM
Points: 1,385, Visits: 1,249
Thanks GSquared - this makes sense to me.

I don't think I agree about the clarity of using CTEs for derived tables, but I guess that might be because I still live in a SQL Server 2000 world (with very strict formatting standards).

I think I'd rather have:
SELECT Column1
FROM (
SELECT Column1
FROM (
SELECT Column1, Column2
FROM SomeTable
WHERE Column2 = 3
) AS Derived1
WHERE Column1 = 2
) AS Derived2
WHERE 1 = 1

than

WITH Derived1 AS (
SELECT Column1, Column2
FROM SomeTable
WHERE Column2 = 4
),
Derived2 AS (
SELECT Column1
FROM Derived1
WHERE Column1 = 3
)
SELECT Column1
FROM Derived2
WHERE 1 = 1

but if I understand correctly that is really a matter of taste / formatting rather than any logical difference (and I just checked the query plan for good measure :))



http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.
Post #470852
Posted Tuesday, March 18, 2008 8:24 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:22 AM
Points: 35,772, Visits: 32,443
To me, CTE's have the advantage of Derived Tables because you can reference them many times in the same query at different levels in the query whereas a Derived table can only be referenced one level up in the same query. It's not a huge advantage until you actually need it. CTE's also (to me, anyway) present themselves nicer for readability and documentation purposes. They can also call themselves (recursion) like a function might.

Temporary tables have the advantage over both because they persist for longer than just one query. I used (and still use) temp tables in place of CTEs ("sub-query refactoring" in Oracle) long before they were available in MSSQL. In fact, if I know a result set must be used across many queries in a sproc, I'll still use TempTables instead of CTE's. I like "local" temp tables a lot... I wish Oracle understood the concept as well. Their "global" temp tables don't work out quite so well.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #470859
Posted Tuesday, March 18, 2008 8:32 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Thursday, December 18, 2014 9:58 AM
Points: 13,872, Visits: 9,600
Tao:

The clarity thing is totally a personal preference. To me, being able to look at the query at the end, especially in things like complex updates, and have that query be only a couple of lines, is more clear than even a well-formatted derived table. But that's for me. I can read either one, I just find the CTE easier. And, as you mention, that's mainly because I've been using them for a while now.

The other point, using them for consistency, is where I think it really matters. If you use them at all, use them as exclusively as possible. Same as any other standard.

fenomenales:

In the example you give, no, there isn't a clear advantage that I'm aware of for the CTE over the temp table.

As with derived tables, the real difference is when you want one that's recursive (like a hierarchy or bill of materials).

Temp tables, like CTEs and unlike derived tables, can be referenced more than once in a query. Temp tables also have the advantage that they can be referenced in more than one query, unlike both CTEs and derived tables. That's significant in many cases.

CTEs, like temp tables, table variables, and derived tables, all reside in memory till they get too big, then they get dumped into the tempdb. (I've tested all of these and they do. You should have seen the 200 Gig tempdb I ended up with from one runaway test. Which is why I do that testing on a desktop box instead of a production server.:) ) So there's no advantage of one over the other in that regard.

Temp tables can also generate statistics, and can be indexed. Again, that can be an important advantage.

Temp tables, on the other hand, aren't recursive. They also require more code to set up, and can easily result in procs that have to recompile every time they are run. (In either the case of mixing DDL and DML, or the case of running a cursor on a temp table, the proc will have to recompile every time it's called. This results in a compile lock for every run, which can result in serious delays for user queries.) CTEs have the advantage there.

It's all a matter of knowing which tool to use for which job.

Use a temp table if more than one query in a proc/script will reference the data. (A table variable works the same way, but table variable vs temp table is a whole discussion all by itself.) Use a CTE if it will be recursive, or only referenced once. Use a CTE to populate a temp table if you need both recursion and multiple references. And so on.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #470869
Posted Tuesday, March 18, 2008 8:37 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:22 AM
Points: 35,772, Visits: 32,443
Tao Klerks (3/18/2008)
Thanks GSquared - this makes sense to me.

I don't think I agree about the clarity of using CTEs for derived tables, but I guess that might be because I still live in a SQL Server 2000 world (with very strict formatting standards).

I think I'd rather have:
SELECT Column1
FROM (
SELECT Column1
FROM (
SELECT Column1, Column2
FROM SomeTable
WHERE Column2 = 3
) AS Derived1
WHERE Column1 = 2
) AS Derived2
WHERE 1 = 1

than

WITH Derived1 AS (
SELECT Column1, Column2
FROM SomeTable
WHERE Column2 = 4
),
Derived2 AS (
SELECT Column1
FROM Derived1
WHERE Column1 = 3
)
SELECT Column1
FROM Derived2
WHERE 1 = 1

but if I understand correctly that is really a matter of taste / formatting rather than any logical difference (and I just checked the query plan for good measure :))



I know they're just examples, but a simple query would solve the both.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #470872
Posted Tuesday, March 18, 2008 8:57 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Monday, December 15, 2014 8:30 AM
Points: 263, Visits: 303
For me a CTE is a lot easier to follow. And the CTE is reusable within the same statement, where a derived table is not. (I love CTE btw) I also have seen indexes being used in a CTE when they were not used in a derived table (but that could have been a side effect of rewriting the query better I suppose?).

Changing the simple example given to use a CTE shows an identical execution plan for me.
;with Data as (
SELECT dept_id, count(*) AS n
FROM employee
GROUP BY dept_id
)
select top 1 dept_id, n
from Data
order by n DESC

And one little (overlooked?) thing about a multiple CTE, if a resultset is not referenced it appears to not be ran (according to the execution plan). Could I get some feedback on this?

Example, with this I only see TableB being shown in the execution plan :
;with
DataA as ( select a from TableA )
,DataB as ( select b from TableB )
,DataC as ( select c from TableC )
select
b
from DataB

Based on what I've coded, I would strongly urge the move to CTE instead of derived tables. :)

Tao Klerks (3/18/2008)
I see the value of CTEs, but I'm not sure I agree with the example... wouldn't it be better written (more efficient?) using a single aggregate query and a TOP clause?

SELECT TOP 1 dept_id, n
FROM (
SELECT dept_id, count(*) AS n
FROM employee
GROUP BY dept_id
) AS a
ORDER BY n DESC

I guess the question for me is: while CTEs provide a nice syntax for repeated and recursive expressions, is the SQL optimizer actually able to use them as efficiently as a statement designed to avoid repetitive expressions in the first place?

I tried this on a very simple dataset (comparing all three statements), and found that the total cost of the queries, in all three cases, was the same. The query plans were slightly more complicated for the original statement and CTE statement, and slightly simpler for the statement above (single Sort/TopN Sort instead of Sort -> Filter -> Top) - unfortunately I don't have a large dataset to test on!

My instinct would be: Try to rewrite your statement to use joins and derived tables rather than subqueries, before moving to CTEs as a way of simplifying your subqueries.


Does anyone know better one way or the other?
Post #470897
Posted Tuesday, March 18, 2008 9:00 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, July 22, 2008 10:53 AM
Points: 13, Visits: 149
Excellent article!
Will put to use immediately!
BTW, in the sentence "I have found them particularly handy when cleaning up demoralized tables. ",
did you mean "I have found them particularly handy when cleaning up denormalized tables."?
Post #470902
Posted Tuesday, March 18, 2008 9:04 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Monday, December 15, 2014 8:30 AM
Points: 263, Visits: 303
Robert (3/18/2008)
As you can see CTEs can be very useful. I have found them particularly handy when cleaning up demoralized tables

What are "demoralized tables"? I didn't know they can have such character. ;)


Could it have been a Freudian slip ? :P
Post #470907
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse