RE: Debugging stacked CTEs efficently

SSC Guru

Points: 182276

March 23, 2013 at 5:47 pm

Banana-823045 (3/20/2013)
I'm curious if anyone else has figured out a good system for performing analysis on a query that may have multiple CTEs stacked. While some problems can be resolved by considering the T-SQL and making logical evaluations, sometime it's nice to be able to see the intermediate data so you can see what is really going on under the hood.

That would require that CTEs be a fundamentally different thing than what they are. The optimizer is handed a CTE by you, the query writer, but that is just a description of the data you want in the form of a question, it is not an actual set of native instructions the database engine can use to retrieve the data. The optimizer will take your query and change it into a native set of instructions. We can get a view into these instructions by looking at the execution plan. In SSMS highlight your query and press Ctrl+L to see one.

Here is an example. Consider this simple query to show all columns in a database:

SELECT s.name AS [schema_name],

t.name AS table_name,

c.name AS column_name

FROM sys.schemas s

INNER JOIN sys.tables t ON s.[schema_id] = t.[schema_id]

INNER JOIN sys.columns c ON t.[object_id] = c.[object_id]

ORDER BY s.name,

t.name,

c.column_id;

Now consider a logically equivalent query that uses a series of three cascaded CTEs:

WITH schema_cte

AS (

SELECT [schema_id],

name AS [schema_name]

FROM sys.schemas

),

table_cte

AS (

SELECT s.[schema_id],

s.schema_name,

t.[object_id],

t.name AS table_name

FROM schema_cte s

INNER JOIN sys.tables t ON s.[schema_id] = t.[schema_id]

),

column_cte

AS (

SELECT t.schema_name,

t.table_name,

c.name AS column_name,

c.column_id

FROM table_cte t

INNER JOIN sys.columns c ON t.[object_id] = c.[object_id]

)

SELECT [schema_name],

table_name,

column_name

FROM column_cte

ORDER BY [schema_name],

table_name,

column_id;

The queries look very different, however they will deliver the same results. Further, when we look at the execution plans of each we see that the optimizer actually decided to execute them in the exact same way.

So, as you can see, it's not as if the database engine processes your query in steps where it materializes the first CTE, then takes that intermediate result and uses it to materialize the second CTE, and so on. It reworks your query, sometimes many thousands of different ways, looking for an efficient way to execute your query.

Regarding debugging, what you have showed in terms of steps is how you would debug a query with cascading CTEs. If you want a debugging experience where you can see the intermediate results consider using #temp tables.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato