With the rollout of SQL Server 2005, Microsoft introduced a new query construct called a common table expression (CTE). A CTE is a temporary result set defined by a simple query, and is used within the execution scope of a single INSERT, UPDATE, DELETE, or SELECT statement. In this article I be exploring how to define and use CTE's.
Defining and Using CTE's
With the introduction of CTE's by Microsoft you now have a different way to build and document complex pieces of TSQL code. By using a CTE you can write and name a TSQL SELECT statement and then reference the named statement later on much like you would reference a table or a view.
Below is the syntax for defining a CTE:
WITH <expression_name> (Column1, Column2, …) AS (CTE Definition)
<expression_name>" is the name of the CTE
- "Column 1, Column2,…" are the names of the columns returned from the CTE (only required if column names are not provided, or are not unique)
- "CTE Definition" is a SELECT that defines the result set of the CTE.
A CTE definition needs to be followed by a single INSERT, UPDATE, DELETE, or SELECT statement that references the CTE. If the CTE is part of a batch then the statement prior to the WITH clause must end with a semi-colon. When you define multiple CTE's, a CTE that references another CTE needs to be define after the CTE it is referencing. This may sound confusing, but we'll see some examples shortly.
Below are some clauses that can't be used in a CTE:
- ORDER BY (unless used with TOP clause)
- OPTION clause with query hints
- FOR XML
- FOR BROWSE
A recursive process is a process that calls itself. Each iteration of the recursive process returns a subset of the answer. The recursive process keeps calling itself until a condition is found that indicates it should not call itself anymore. The final result set from a recursive CTE in SQL Server is the joined set of all the result sets from all the calls of the recursive query.
A CTE can reference itself. When this is done it is called a recursive CTE. A recursive CTE contains at least two query definitions, which are SELECT statements. One query definition is known as the anchor member and the other query definition is known as the recursive member. An anchor member query definition will not reference the CTE whereas the recursive member will reference the CTE. Additionally the anchor member queries need to appear in the CTE prior to the recursive member queries. The number of columns returned from the anchor member must be the same number of columns returned from the recursive member. There can be multiple anchor query definitions where each one is combined with the others using one of these operators: UNION ALL, UNION, INTERSECT, or EXCEPT. There can also be multiple recursive query definitions. Each recursive query definition must be combined using a UNION ALL operation. The UNION ALL operation is also needed to connection the last anchor query definition with the first recursive query definition. Again, this will become clear when we look at some examples.
Enough discussion about CTE's and recursive CTE's. Let's review a few examples of CTE's.
Example of a Simple CTE
As stated earlier CTE's provide you with a method to more easily document your complex code for readability. Suppose you have the code in Listing 1.
USE AdventureWorks2012; GO SELECT YearMonth, ProductID, SumLineTotal FROM ( SELECT CONVERT(CHAR(7),ModifiedDate,120) AS YearMonth , ProductID , SUM(LineTotal) AS SumLineTotal FROM Sales.SalesOrderDetail GROUP BY ProductId, CONVERT(CHAR(7),ModifiedDate,120) ) MonthlyProductSales WHERE YearMonth = '2008-06';
Listing 1 has a SELECT statement that has a subquery buried within the FROM clause. The subquery is a derived table with the alias MonthlyProductSales, which summarizes the LineTotal amount for each Year/Month combination of the ModifiedDate. From the results of my MonthlyProductSales subquery I constrained the results to only return those rows that have a year and month of June, 2008.
Although the code in listing 1 is fairly simple we can improve on its readability by rewriting it uising a CTE, as I have done in Listing 2.
USE AdventureWorks2012; GO -- CTE Definition WITH MonthlyProductSales AS ( SELECT CONVERT(CHAR(7),ModifiedDate,120) AS YearMonth , ProductID , SUM(LineTotal) AS SumLineTotal FROM Sales.SalesOrderDetail GROUP BY ProductId, CONVERT(CHAR(7),ModifiedDate,120) ) -- SELECT statement referencing CTE SELECT * FROM MonthlyProductSales WHERE YearMonth = '2008-06';
In Listing 2 I moved to derived table subquery in Listing 1 up into a CTE named MonthlyProductSales. I then replaced the subquery in my SELECT statement with the name of the CTE, in this case MonthlyProductSales. By moving the subquery in Listing 1, into a CTE definition, many people will find the code in Listing 2 easier to read and maintain.
Example of Using Multiple CTE's
My code in Listing 1 was not all that complex. If your code is more complex and contains multiple subqueries you could consider rewriting it to simplify maintenance and to make it more readable. One of those way to rewrite it would be to take all the subqueries and rewrite them as CTEs.
To demonstrate using more than one CTE in a single SELECT statement, suppose I had the non-CTE query in Listing 3:
USE AdventureWorks2012; GO SELECT SalesPersonID , SalesYear , TotalSales , SalesQuotaYear , SalesQuota FROM ( -- First Subquery SELECT SalesPersonID , SUM(TotalDue) AS TotalSales , YEAR(OrderDate) AS SalesYear FROM Sales.SalesOrderHeader WHERE SalesPersonID IS NOT NULL GROUP BY SalesPersonID, YEAR(OrderDate) ) AS Sales JOIN ( -- Second Subquery SELECT BusinessEntityID , SUM(SalesQuota)AS SalesQuota , YEAR(QuotaDate) AS SalesQuotaYear FROM Sales.SalesPersonQuotaHistory GROUP BY BusinessEntityID, YEAR(QuotaDate) ) AS Sales_Quota ON Sales_Quota.BusinessEntityID = Sales.SalesPersonID AND Sales_Quota.SalesQuotaYear = Sales.SalesYear ORDER BY SalesPersonID, SalesYear;
The query in Listing 3 contains two different subqueries, both implemented as derived tables. The first subquery calculates the TotalSales by SalesPersonID and SalesYear. The SalesQuota aggregated value, based on BusinessEntityID and SalesQuotaYear, is calculated in second subquery. To simplify reading the query in Listing 3, we can rewrite the subqueries as two different CTEs and then reference those CTEs in a simple SELECT statement, as I have done in Listing 4.
USE AdventureWorks2012; GO -- First Subquery rewritten as CTE WITH Sales AS ( SELECT SalesPersonID , SUM(TotalDue) AS TotalSales , YEAR(OrderDate) AS SalesYear FROM Sales.SalesOrderHeader WHERE SalesPersonID IS NOT NULL GROUP BY SalesPersonID, YEAR(OrderDate) ), -- Second Subquery rewritten as CTE Sales_Quota AS ( SELECT BusinessEntityID , SUM(SalesQuota)AS SalesQuota , YEAR(QuotaDate) AS SalesQuotaYear FROM Sales.SalesPersonQuotaHistory GROUP BY BusinessEntityID, YEAR(QuotaDate) ) -- SELECT using multiple CTEs SELECT SalesPersonID , SalesYear , TotalSales , SalesQuotaYear , SalesQuota FROM Sales JOIN Sales_Quota ON Sales_Quota.BusinessEntityID = Sales.SalesPersonID AND Sales_Quota.SalesQuotaYear = Sales.SalesYear ORDER BY SalesPersonID, SalesYear;
In Listing 4 I moved the two subqueries used in Listing 3 into two different CTEs. The first CTE, defined with a name of Sales, contains the subquery that produced the TotalSales amount by SalesPersonID and SalesYear. I defined the second CTE, which is named SalesQuota, by placing a comma after the first CTE defined. This second CTE contained the subquery to calculate the SalesQuota amount. After defining both CTEs I then referenced them in my final SELECT statement.
Being able to define multiple CTEs with a single WITH clause, and then referencing these CTEs in my final TSQL statement allowed me to make my complicated TSQL code in Listing 3 easier for me to read, develop and debug. Using multiple CTEs for complicated TSQL logic allows you to break your code into manageable pieces and parts.
CTE referencing another CTE
A CTE can reference another CTE. In order for a CTE to reference another CTE it needs to:
- be defined within the same WITH clause as the CTE being referenced
- be defined after the CTE being referenced
To demonstrate a CTE referencing another CTE let's review the code in Listing 5.
USE AdventureWorks2012; GO WITH -- First Subquery rewritten as CTE Sales AS ( SELECT SalesPersonID , SUM(TotalDue) AS TotalSales , YEAR(OrderDate) AS SalesYear FROM Sales.SalesOrderHeader WHERE SalesPersonID IS NOT NULL GROUP BY SalesPersonID, YEAR(OrderDate) ), -- Second Subquery that references first CTE TotalSales AS ( SELECT SUM(TotalSales)AS TotalSales , SalesYear FROM Sales GROUP BY SalesYear ) -- Calling Second CTE that uses first CTE SELECT * FROM TotalSales ORDER BY SalesYear;
In Listing 5 I define a CTE named Sales that is referenced by the second CTE definition. In the second CTE, named TotalSales, I reference the Sales CTE. In this CTE I summarized the TotalSales column returned from the Sales CTE by SalesYear. At the end I used a SELECT statement to reference just the second CTE. When this TotalSales CTE is executed it will then call the Sales CTE.
Example of a Recursive CTE
Another feature of a CTE is that it can call itself. When you do this the CTE is known as a recursive CTE.
A recursive CTE has two different parts called the anchor member and the recursive member. The anchor member proceeds the recursive member in the CTE. You can identify the anchor member queries as the queries without a reference to the CTE. The recursive members will have a reference to the CTE in their queries. The anchor member identifies that initial record set for the CTE. This initial record set is then used by the recursive member. The recursive member is the part that is connected to the anchor member using the UNION ALL clause, and references the CTE in its query. The recursive member returns the same number of columns as the anchor member and references the CTE. A recursive CTE continues calling itself until the recursive part of the CTE doesn't return any rows.
To better understand a recursive CTE I will be creating some sample data for use with a recursive CTE. You can find the code to create my sample data in Listing 6.
USE tempdb; GO -- Create an Employee table CREATE TABLE dbo.Employee ( EmpID smallint NOT NULL, EmpName nvarchar(100) NOT NULL, Position nvarchar(50) NOT NULL, MgrId int NULL ); -- Populate Employee Table with values INSERT INTO dbo.Employee VALUES (1, N'Joe Steel', N'President',NULL) ,(2, N'John Smith', N'VP Western Region Sales',1) ,(3, N'Sue Jones', N'VP Easter Region',1) ,(4, N'Lynn Holland', N'Sales Person',2) ,(5, N'Linda Thomas', N'Sales Person',3 ) ,(6, N'Kathy Johnson', N'Admin Assistant',1) ,(7, N'Rich Little', N'Sales Person',3) ,(8, N'David Nelson', N'Sales Person', 2) ,(9, N'Mary Jackson', N'Sales Person', 3);
In Listing 6 I created a table named Employee to contain information about employees. This table contains 9 different employees. The MgrID column identifies the manager to which an employee reports. There is one Employee record where the MgrID is NULL. This employee is the president of the company and he reports to no one.
To see how I can use the data created in Listing 6 in a recursive CTE, look at the code in Listing 7.
USE tempdb; GO WITH ReportingStructure(MgrID, EmpID, EmpName, Position, OrgLevel) AS ( -- Anchor part SELECT MgrID, EmpID, EmpName, Position, 0 AS OrgLevel FROM dbo.Employee WHERE MgrID IS NULL UNION ALL -- Recursive part SELECT e.MgrID, e.EmpID, e.EmpName , e.Position, r.OrgLevel + 1 FROM dbo.Employee AS e INNER JOIN ReportingStructure AS r ON e.MgrID = r.EmpID ) SELECT MgrID, EmpID, EmpName, Position, OrgLevel FROM ReportingStructure;
When I run the code in Listing 7 I get the output in Report 1.
MgrID EmpID EmpName Position OrgLevel ----- ------ -------------- -------------------------- ----------- NULL 1 Joe Steel President 0 1 2 John Smith VP Western Region Sales 1 1 3 Sue Jones VP Easter Region 1 1 6 Kathy Johnson Admin Assistant 1 2 4 Lynn Holland Sales Person 2 2 8 David Nelson Sales Person 2 3 5 Linda Thomas Sales Person 2 3 7 Rich Little Sales Person 2 3 9 Mary Jackson Sales Person 2
By reviewing the output in Report 1 you can see the reporting structure of all employees. Note that the OrgLevel column identifies the reporting structure hierarchy. As you can see the employee that holds the “President” position has an OrgLevel of 0. Each employee that reports to the president has an OrgLevel of 1. If you look at the other employees you can see they each have an OrgLevel of 2 because they report to one of the Vice Presidents.
To better understand how a recursive CTE works let me step you through the process SQL Server uses when executing my recursive CTE in Listing 7. The first pass through my recursive CTE SQL Server runs the anchor statement. When the anchor statement is run it returns the employees that have a NULL MgrID value. In this example that would be the employee with an EmpID of 1. This first pass through the CTE also sets the Orglevel to 0. The second pass through my CTE runs the recursive part of the CTE. In the recursive part of the CTE all employees that have a MgrId equal to the employees found in the first pass, or MgrId of 1 are returned. This second pass returns EmpID 2, 3 and 6 and increments the OrgLevel by 1. The third pass through my CTE runs the recursive part of the CTE again. This time my CTE returns the employees that have MgrID's equal to one of the EmpIDs returned from the second pass through the CTE, which means the employees that have a MgrId of 2,3 or 6. The third pass returns EmpIds 5, 7, 8 and 9. The forth pass through my CTE the recursive part of the CTE looks for Employee records that have a MgrID of 5, 7, 8, and 9. This pass returns no values because there are no employees with MgrId value of 5, 7, 8 and 9. Since the forth pass returns no records this ends the processing of my recursive CTE.
It is possible to write a recursive CTE that is an infinite loop. SQL Server has a default MAXRECURSION setting that will keep you from running a CTE that might be an infinite loop. The default MAXRECURSION setting is 100.
The code in Listing 8 is an example of a poorly written CTE that creates an infinite loop.
USE tempdb; GO WITH InfiniteLoopCTE as ( -- Anchor Part SELECT EmpID, MgrID, Position FROM dbo.Employee WHERE MgrID = 1 UNION ALL -- Recursive Part SELECT InfiniteLoopCTE.EmpID , InfiniteLoopCTE.MgrID , InfiniteLoopCTE.Position FROM InfiniteLoopCTE JOIN dbo.Employee AS e ON e.EmpID = InfiniteLoopCTE.MgrID ) SELECT * FROM InfiniteLoopCTE;
The code is Listing 8 causes an infinite loop because the recursive part of this CTE will always return one or more rows. The recursive part of the query always returns MrgID value of 1 because the recursive part of the query returns InfiniteLoopCTE.MgrId, instead of the Employee.MgrID value. When I run the code in Listing 8 it will run until the CTE has hit the MAXRECUSION default setting, which is 100. If a CTE hits the MAXRECURSIVE setting SQL Server will return an error when running the statement.
If your CTE needs to process more than 100 recursive call then you can override the default MAXRECUSION setting by using the MAXRECURSION query option, like I have done in the code in Listing 9.
USE tempdb; GO --Creates an infinite loop WITH InfiniteLoopCTE as ( -- Anchor Part SELECT EmpID, MgrID, Position FROM dbo.Employee WHERE MgrID = 1 UNION ALL -- Recursive Part SELECT InfiniteLoopCTE.EmpID , InfiniteLoopCTE.MgrID , InfiniteLoopCTE.Position FROM InfiniteLoopCTE JOIN dbo.Employee AS e ON e.EmpID = InfiniteLoopCTE.MgrID ) SELECT * FROM InfiniteLoopCTE OPTION (MAXRECURSION 150);
In Listing 9 I overrode the default MAXRECUSION setting by adding the query option “MAXRECURSION 150”. This allows my infinite loop CTE to process 150 recursive calls before the SQL Server engine stops my recursive CTE with an error.
The maximum value you can set the MAXRECURSION option is 32,767. If for some reason you have a recursive CTE that needs to process to more than 32,767 recursive CTE calls then you can do this by setting the MAXRECURSION value to 0. By using 0 for a MAXRECURSION setting, you are allowing the database engine to run as many iteration as it needs to complete the recursive part of the CTE. If your code is not well tested, you could end up with a truly infinity loop that will never stop.
Reasons to use CTE
When should you use a CTE? Below are some good reasons when to use a CTE to resolve your TSQL statement requirements:
- If your query needs a recursive solution
- If your query has multiple subqueries, or you have repeated the same subquery code in a single statement
- If your query is large and complex
When your TSQL code is complex and/or contains one or more subqueries it become very hard to read. By breaking up your complex code into one of more CTEs you will make your code much easier to follow, maintain and debug.
The CTE functionality was a great addition to SQL Server 2005. CTEs allow you to take your complex code and break it up into smaller more manageable pieces of code. CTEs also allow you to build recursive code. CTEs provides yet another method to implement complicate TSQL logic. Take the time to develop a few CTEs for your environment.
Question and Answer
In this section you can review how well you have understood using Common Table Expressions (CTEs).
When your CTE code is not the first statement in its batch, what must appear prior to the WITH clause of the CTE?
- a “GO” command
- a semicolon character
- a “END” keyword
- a “(“ character
If you define 4 CTEs to be used by a SELECT statement how many WITH clauses will be required to define the 4 different CTEs:
What is the maximum n
of recursive CTE calls that can be made?
- 32, 787
- None of the above
The correct answer is b. When a CTE is not the first statement in a batch, the statement prior to the WITH clause needs to be terminated with a semicolon.
The correct answers is a. The WITH clause is only specified 1 time when you have multiple CTEs that are used for a given query. The WITH clause precedes the first CTE definition. Each CTE definition that follows the first CTE definition needs to be preceded by a comma.
The correct answer is d. By default the MAXRECUSION default setting is 100. You can override this value by setting the MAXRECURSION option when you call your recursive CTE. When you use the MAXRECURSION option the limit is 32,787. If your CTE needs to be called recursively more than 32,787 times to complete, then you can set the MAXRECURSION to 0. Therefore there really is no maximum number of times a recursive CTE can be called.