SQLServerCentral Article

Using a Recursive CTE to Generate a List

,

Database developers need to combine values from multiple rows into a list for various reasons, including:

  1. displaying lists of values per group in aggregate queries;
  2. passing a comma-delimited list of values to a function. 

The example below illustrates lists of counties generated per state. The left table contains two states with five counties per state, resulting in a total of 10 rows. The five rows for the state of 'AZ' in the left table are combined into one on the right, with the CountyID's placed in a comma separated list. Analogically, the five rows for the state of 'CA' are combined into a second row in the right table.

In terms of SQL operations, the rows in the left table are grouped by State and a list of CountyIDs is generated for each state. Let's look at how this is achieved using recursive CTEs.

 Overview of Recursive CTEs

 A recursive CTE requires at least one UNION ALL operator. The simplest recursive form has one UNION ALL and two queries[1]:

The anchor query is executed exactly once and the result is united with the results from the recursive member, which is executed until it returns an empty set. You have to provide a condition that will ensure the recursive member query does not run indefinitely. The easiest way to ensure termination is to use an increasing identity column in the base_table.

Using recursive CTEs to generate a list of values

Let's examine a sample database schema, which consists of tables Office, OfficeCounty and County. Each company has multiple offices. Each office serves one or more counties. Each county is served by no more than one office. The following normalized database model is employed:

The following code will generate the sample schema shown above:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[County](
[CountyId] [smallint] IDENTITY(1,1) NOT NULL,
[CountyName] [varchar](60) NOT NULL,
[StateAbbr] [varchar](2) NOT NULL,
CONSTRAINT [PK_County_1] PRIMARY KEY CLUSTERED 
(
[CountyId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
GO
CREATE TABLE [dbo].[Office](
[OfficeID] [int] IDENTITY(1,1) NOT NULL,
[OfficeName] [varchar](100) NOT NULL,
[Address1] [varchar](100) NULL,
[Address2] [varchar](100) NULL,
CONSTRAINT [PK_Office_2] PRIMARY KEY CLUSTERED 
(
[OfficeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
GO
CREATE TABLE [dbo].[OfficeCounty](
[OfficeID] [int] NOT NULL,
[CountyID] [smallint] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[OfficeCounty] 
 WITH CHECK ADD CONSTRAINT [FK_County] FOREIGN KEY([CountyID])
 REFERENCES [dbo].[County] ([CountyId])
GO
ALTER TABLE [dbo].[OfficeCounty] CHECK CONSTRAINT [FK_County]
GO
ALTER TABLE [dbo].[OfficeCounty]
 WITH CHECK ADD CONSTRAINT [FK_Office] FOREIGN KEY([OfficeID])
 REFERENCES [dbo].[Office] ([OfficeID])
GO
ALTER TABLE [dbo].[OfficeCounty] CHECK CONSTRAINT [FK_Office]
GO
SET ANSI_PADDING OFF
GO

Let's insert some data into the tables. We will create two offices called 'Draper, UT' and 'Orange Park, FL'. We will also insert 12 counties in the states of Florida('FL') and Utah('UT'). State 'FL' has 5 counties; 'UT' has 7 counties. Finally, in the OfficeCounty table, we will associate the 'Orange Park, FL' office with the 5 counties in Florida and the 'Draper, UT' office with the seven counties in Utah.

--Generate Sample data
/****** Object: Table [dbo].[Office] Script Date: 06/17/2013 16:02:13 ******/SET IDENTITY_INSERT [dbo].[Office] ON
INSERT [dbo].[Office] ([OfficeID], [OfficeName], [Address1], [Address2]) VALUES (151978, N'Draper, UT', N'138 street 122', NULL)
INSERT [dbo].[Office] ([OfficeID], [OfficeName], [Address1], [Address2]) VALUES (151981, N'Orange Park, FL', N'2 Wells Rd Ste', NULL)
SET IDENTITY_INSERT [dbo].[Office] OFF
/****** Object: Table [dbo].[County] Script Date: 06/17/2013 16:02:13 ******/SET IDENTITY_INSERT [dbo].[County] ON
INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (325, N'Baker County, FL', N'FL')
INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (333, N'Clay County, FL', N'FL')
INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (338, N'Duval County, FL', N'FL')
INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (368, N'Nassau County, FL', N'FL')
INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (378, N'St. Johns County, FL', N'FL')
INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (2786, N'Davis County, UT', N'UT')
INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (2798, N'Salt Lake County, UT', N'UT')
INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (2802, N'Summit County, UT', N'UT')
INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (2803, N'Tooele County, UT', N'UT')
INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (2805, N'Utah County, UT', N'UT')
INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (2806, N'Wasatch County, UT', N'UT')
INSERT [dbo].[County] ([CountyId], [CountyName], [StateAbbr]) VALUES (2809, N'Weber County, UT', N'UT')
SET IDENTITY_INSERT [dbo].[County] OFF
/****** Object: Table [dbo].[OfficeCounty] Script Date: 06/17/2013 16:02:13 ******/INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151978, 2786)
INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151978, 2798)
INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151978, 2802)
INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151978, 2803)
INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151978, 2805)
INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151978, 2806)
INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151978, 2809)
INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151981, 325)
INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151981, 333)
INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151981, 338)
INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151981, 368)
INSERT [dbo].[OfficeCounty] ([OfficeID], [CountyID]) VALUES (151981, 378)

Problem definition

We want to create a report that displays offices and comma-delimited lists of counties served by each office. We will walk through the solution in the steps below.

Step 1 - Creating the source CTE

The following script returns a list of values for OfficeID, CountyName and StateAbbr. Two additional columns are added for the purpose of the recursive CTE:

  • rank_county_name: a unique sequential integer starting with 1 for the CountyName and the OfficeID/StateAbbr.
  • max_rank_county_name: the total number of CountyName values for that OfficeID/StateAbbr. It is also equal to the maximum rank_county_name for the OfficeID.
SELECT 
 o.OfficeID,
 c.CountyName,
 c.StateAbbr,
 ROW_NUMBER() OVER (PARTITION BY o.OfficeID,c.StateAbbr ORDER BY CountyName) AS rank_county_name,
 COUNT(CountyName) OVER (PARTITION BY o.OfficeID,c.StateAbbr) AS max_rank_county_name
 FROM dbo.Office o
  INNER JOIN dbo.OfficeCounty oc
   ON o.OfficeID=oc.OfficeID
  INNER JOIN dbo.County c
   ON c.CountyId=oc.CountyID

The result from the query is as follows:

OfficeID CountyName StateAbbr rank_county_name max_rank_county_name
151978 Davis County, UT UT 1 7
151978 Salt Lake County, UT UT 2 7
151978 Summit County, UT UT 3 7
151978 Tooele County, UT UT 4 7
151978 Utah County, UT UT 5 7
151978 Wasatch County, UT UT 6 7
151978 Weber County, UT UT 7 7
151981 Baker County, FL FL 1 5
151981 Clay County, FL FL 2 5
151981 Duval County, FL FL 3 5
151981 Nassau County, FL FL 4 5
151981 St. Johns County, FL FL 5 5

For simplicity, we will put the query above in a  CTE:

WITH office_county_state AS (
  SELECT
   o.OfficeID,
   c.CountyName,
   c.StateAbbr,
   ROW_NUMBER() OVER (PARTITION BY o.OfficeID,c.StateAbbr ORDER BY CountyName) AS rank_county_name,
   COUNT(CountyName) OVER (PARTITION BY o.OfficeID,c.StateAbbr) AS max_rank_county_name
  FROM dbo.Office o
   INNER JOIN dbo.OfficeCounty oc
    ON o.OfficeID=oc.OfficeID
   INNER JOIN dbo.County c
    ON c.CountyId=oc.CountyID
 )

The CTE above provides a convenient way to view and query only the list of OfficeId, CountyName and StateAbbr that we will need for the final resultset.

Step 2- Creating a recursive CTE

The basic idea is to start with a list of one CountyName and then consecutively append each remaining CountyName, separated by a comma. There are two questions that arise in regards to creating the recursive CTE:

  1. how to select the anchor query?
  2. how to ensure that the CTE terminates?

 The anchor query should return only the first CountyName for every OfficeID/StateAbbr from the source CTE and will be invoked once. That is why we should use the predicate

WHERE rank_county_name=1.

From the sample data, the anchor query will return only two rows:

anchor query - resultset
OfficeID countynames Stateabbr rank_county_name max_rank_county_name
151978 Davis County, UT UT 1 7
151981 Baker County, FL FL 1 5

Then the recursive member will be invoked multiple times and in each invocation it will join the previous resultset with the source CTE. Every time the recursive member is invoked, it will add one additional row to the resultset with rank_county_name greater than 1 from the rank_county_name added from the previous round, because of the predicate:

WHERE d.rank_county_name=1+recurs_CTE.rank_county_name

Eventually after the recursive member is executed several times, the row where "rank_county_name = max_rank_county_name" will be added to the resultset and the next execution of the recursive member will return an empty resultset, so the recursive CTE will stop.

 The following code lists the complete code of the recursive CTE:

 -- source CTE:
WITH office_county_state AS (
  SELECT 
   o.OfficeID,
   c.CountyName,
   c.StateAbbr,
   ROW_NUMBER() OVER (PARTITION BY o.OfficeID,c.StateAbbr ORDER BY CountyName) AS rank_county_name,
   COUNT(CountyName) OVER (PARTITION BY o.OfficeID,c.StateAbbr) AS max_rank_county_name
  FROM dbo.Office o
   INNER JOIN dbo.OfficeCounty oc
    ON o.OfficeID=oc.OfficeID
   INNER JOIN dbo.County c
    ON c.CountyId=oc.CountyID
 )
--recursive CTE:
,recurs_CTE AS
(
--anchor query:
SELECT 
   OfficeID,
   StateAbbr,
   rank_county_name,
   max_rank_county_name,
   CAST(CountyName AS VARCHAR(8000)) AS countynames
 FROM office_county_state
 WHERE rank_county_name=1
UNION ALL
--recursive member:
 SELECT 
   d.OfficeID,
   d.StateAbbr,
   d.rank_county_name,
   d.max_rank_county_name,
   CAST(recurs_CTE.countynames + ',' + d.CountyName AS VARCHAR(8000))
  FROM recurs_CTE
   INNER JOIN office_county_state d
    ON d.OfficeID=recurs_CTE.OfficeID
    AND d.StateAbbr=recurs_CTE.StateAbbr
  WHERE d.rank_county_name=1+recurs_CTE.rank_county_name --ensures termination
)

If we examine the resultset for one of the offices in state 'FL', we can see how the CountyName list expanded by every iteration of the recursive CTE:

OfficeID StateAbbr rank_county_name max_rank_county_name countyNames
151981 FL 1 5 Baker County, FL
151981 FL 2 5 Baker County, FL,Clay County, FL
151981 FL 3 5 Baker County, FL,Clay County, FL,Duval County, FL
151981 FL 4 5 Baker County, FL,Clay County, FL,Duval County, FL,Nassau County, FL
151981 FL 5 5 Baker County, FL,Clay County, FL,Duval County, FL,Nassau County, FL,St. Johns County, FL

As you can see, after the row with "rank_county_name = 5" has been added, the recursive member will return an empty resultset and the recursive CTE will terminate.

Because we use a UNION ALL, the values and the list of values must be explicitly converted to the same data type in the recursive CTE.

Step 3 - Generating the final resultset

The remaining task is to filter the rows, which contain the complete list of CountyName values. These are the rows that have "rank_county_name = max_rank_county_name":

--source CTE:
WITH office_county_state AS (
 SELECT 
   o.OfficeID,
   c.CountyName,
   c.StateAbbr,
   ROW_NUMBER() OVER (PARTITION BY o.OfficeID,c.StateAbbr ORDER BY CountyName) AS rank_county_name,
   COUNT(CountyName) OVER (PARTITION BY o.OfficeID,c.StateAbbr) AS max_rank_county_name
 FROM dbo.Office o
  INNER JOIN dbo.OfficeCounty oc
   ON o.OfficeID=oc.OfficeID
  INNER JOIN dbo.County c
   ON c.CountyId=oc.CountyID
 )
--recursive CTE:
,recurs_CTE AS
(
--anchor query:
SELECT 
  OfficeID,
  StateAbbr,
  rank_county_name,
  max_rank_county_name,
  CAST(CountyName AS VARCHAR(8000)) AS countynames
 FROM office_county_state
 WHERE rank_county_name=1
UNION ALL
--recursive member:
SELECT 
  d.officeID,
  d.StateAbbr,
  d.rank_county_name,
  d.max_rank_county_name,
  CAST(recurs_CTE.countynames + ',' + d.CountyName AS VARCHAR(8000))
 FROM recurs_CTE
  INNER JOIN office_county_state d
   ON d.OfficeID=recurs_CTE.OfficeID
   AND d.StateAbbr=recurs_CTE.StateAbbr
 WHERE d.rank_county_name=1+recurs_CTE.rank_county_name
)
SELECT recurs_CTE.OfficeID,
  recurs_CTE.StateAbbr,
  recurs_CTE.countynames 
 FROM recurs_CTE
 -- we filter the resultset from the recursive CTE to get the longest list of countyName
 WHERE recurs_CTE.rank_county_name=recurs_CTE.max_rank_county_name

We added the SELECT statement that queries against the recursive CTE but return only the rows that contain the complete lists.

Considerations

The default maximum recursion level for SQL Server is 100. Therefore the recursive CTE can be invoked only 100 times by default. If you expect to have a list with more than 100 values, you should change the maximum recursion level using the MAXRECURSION OPTION. The maximum list length allowed is 32,767.

In case of tables with hundreds of rows, the running time of the recursive CTE can reach several minutes. In order to decrease the query execution time significantly, it is beneficial to materialize the office_county_state CTE into a temporary table. 

Conclusion

The article shows an easy, fast and flexible approach to generating lists of values from multiple rows based on recursive CTEs. In the scenario of a database with office locations, we show how to generate lists of county names for each office and state, where OfficeID is the unique row identifier, CountyName is the list value and StateAbbr is the grouping value. The presented implementation be tailored to any database schema that has a unique row id, a list value and a grouping value(s). The implementation is flexible because the list separator and the data type of the list can be easily changed and the string concatenation can be substituted with another operator such as sumation. The approach presented in this article achieves better execution time compared to using a CURSOR with fewer lines of code.

Reference

[1]"Training Kit (Exam 70-461): Querying Microsoft SQL Server 2012"by Itzik Ben-Gan  , Dejan Sarka  , Ron Talmage

 

Rate

4.64 (76)

You rated this post out of 5. Change rating

Share

Share

Rate

4.64 (76)

You rated this post out of 5. Change rating