SQLServerCentral Article

Concatenating Rows

,

Intent

Sometimes when generating reports or text for web pages or form letters, it becomes handy to concatenate multiple column values from rows of data into a single value entirely within T-SQL. For reference, a problem I encountered recently involved selecting one or more department names and converting them to a comma-separated list to appear in a drop-down list.

The SQL language itself doesn't easily facilitate this but there are ways around this. Below I outline two alternate queries using SQL Server.

Technical Goals

I wanted to concatenate values from several rows into a single value using an optional row separator / delimiter without using a cursor and without excessive wasted performance.

Consider the following table, FRUIT:

id

name

101

Apple

102

Banana

103

Orange

104

Melon

105

Grape

The goal is to select rows from this table and efficiently create a single result such as:
Apple, Banana, Orange, Melon, Grape

Solution 1

This problem can be solved using common table expressions (CTE) and in particular, a recursive CTE. To make this work, I created three CTEs. The first assigns a unique and sequential row number to each row value:

with 
list as (
  select
    name as value,
    row_number() over(order byname)as num
  from
    fruit
)

CTE Output:

value

num

Apple

1

Banana

2

Grape

3

Melon

4

Orange

5

The second CTE is recursive. Recursive CTEs were not intuitive to me when I first started working with them; if this section seems a little confusing at first, don't worry - bear with me. The basic anatomy of a recursive CTE is at least two sections separated by a UNION ALL statement. The first section is known as the non-recursive section and most of the normal SQL rules apply here. Subsequent sections are recursive, that is, they may select from the parent CTE.

Here my non-recursive section selects from the first CTE and inserts a third column which is used as a temporary variable which will hold the concatenated values as recursion progresses. Essentially, this new column will be built up with each recursion:

with
concatenations as (
   -- non-recursive section
select
    value as head, 
    cast(''as varchar(MAX)) as tail,
     num 
  from
     list
union all
   -- recursive section
select
     head,
     (select value from list where num = prev.num - 1) + tail as tail,
     num - 1
from
     concatenations as prev -- note: recursive selection happens here!
where
     num > 0
 )

CTE Output:

head

tail

num

Apple

1

Banana

2

Grape

3

Melon

4

Orange

5

OrangeMelon

4

OrangeGrapeMelon

3

OrangeBananaGrapeMelon

2

OrangeAppleBananaGrapeMelon

1

OrangeNULL0

Melon

Grape

3

MelonBananaGrape

2

MelonAppleBananaGrape1

MelonNULL0

Grape

Banana

2

GrapeAppleBanana

1

GrapeNULL0

Banana

Apple

1

Banana

NULL0

AppleNULL0

The third and final CTE aims to select only the line shown above in green. You will notice that the comma separator is missing; don't worry, we'll add that next.

with
concatenated as (
select
    max(tail + head) as items
from
    concatenations
where
    num = 1 
) 

CTE Output:

items

AppleBananaGrapeMelonOrange

Thoughts

This result is very close to what we originally intended. The only differences are the lack of an item delimiter and the sort order is determined by the name and not the ID of the fruit.

Solution 1 Revised

The first thing to consider is the missing delimiter. This is easily added (the + ', ' section below the union) in the recursive CTE where the previous value is selected in the recursive section:

with
concatenations as (
select
    value as head,
cast('' as varchar(MAX)) as tail,
     num
from
    list
union all
select
     head,
     (select value from list where num = prev.num - 1) + ', ' + tail as tail,
     num - 1
from
     concatenations as prev
where</>strong>
     num > 0
 )

The second consideration is the changed sort order. To correct that, we can simply change the CTE that assigns consecutive row numbers to each element:

with
list as (
  select
    name as value,
    row_number() over (order by id) as num
  from
    fruit
) 

Thoughts

We finally have the desired output but there is still a lurking performance bug. The number of rows selected in the recursive CTE grows exponentially with the number of input rows. For example, the above input of 5 rows creates 20 rows whereas an input of 1,000 rows creates 501,500 rows! Fortunately, there is a simple change that keeps the recursive CTE growth linear.

Optimization

Looking at the rows selected by the CTE, we see that no rows after the target row (highlighted in green above) are necessary for the final result. To eliminate them, we can add a clause to restrict the value of head to the be the final value returned from the list CTE. Note: we have to calculate this value in the non-recursive section because the rules for the recursive section don't allow doing it there.

with
concatenations as (
  select 
    value as head, 
    cast('' as varchar(MAX)) as tail,
     num,
     (select top 1 name from fruit order by id desc) as terminator
  from 
    list
union all
  select
    head,
    (select value from list where num = prev.num - 1) + ', ' + tail as tail,
    num - 1,
     terminator
  from
    concatenations as prev
  where
    num > 0
      and head = terminator
)

Solution 1 Finished

Below is everything listed above in one expression for your convenience:

with   
fruit as (
  select 'Apple'  as name, 101 as id unionall
  select 'Banana' as name, 102 as id unionall
  select 'Orange' as name, 103 as id unionall
  select 'Melon'  as name, 104 as id unionall
  select 'Grape'  as name, 105 as id 
)
 ,
list as (
  select
    name as value,
    row_number() over(order byid) asnum
  from
    fruit
)
,
concatenations as (
  select 
    value as head, 
    cast('' as varchar(MAX)) as tail, 
    num,
    (select top 1 name from fruit order by id desc) as terminator
  from 
    list
union all
  select
    head,
    (select value from list where num = prev.num - 1) + ', ' + tail as tail,
    num - 1, 
    terminator
  from
    concatenations as prev
  where
    num > 0
    and head = terminator
)
 ,
concatenated as (
  select
    max(tail + head) as items
  from
    concatenations
  where
    num = 1 
)
select * from concatenated

Conclusion

This solution satisfies the initial purpose by efficiently concatenating any number of rows into one string without using anything besides common table expressions.

Solution 2

One option I have explored is the FOR XML PATH expression. In this example, no CTEs are used. The syntax is more compact than the above lines, but I have found performance to be unpredictable; your mileage may vary!

Here's the equivalent of the entire CTE code above:

select 
   replace(
     replace(
       replace(
         (select name from fruit order by id for xml path('')),
         '</name><name>',
         ', ' -- delimiter
       ),
       '</name>',
       ''
     ),
     '<name>',
     ''
   )

Final Thoughts

Considering the two solutions above, obviously the second is simpler and therefore easier to maintain and understand. I have encountered odd performance issues using the XML based solution that disappeared when I switched to the recursive CTE version. However, Solution 1 has many moving parts and it's not desirable to have this extent of duplicated logic whenever you want to concatenate items. It would be nice to put this logic into a simple stored function, but that's the topic for a future article.

Rate

3.39 (100)

You rated this post out of 5. Change rating

Share

Share

Rate

3.39 (100)

You rated this post out of 5. Change rating