SQLServerCentral Article

Generating a Distinct Delimited List Using XML

,

Introduction

This article will help you to generate an ordered, distinct, delimited string in a single query using an XML. An ordered string here does not mean an alphabetical order or numeric order of the values in a delimited string. An order means the sequence in which the data was stored. A sequence of data can be determined from the fields like timestamp, identity column or any such column. A list of values is retrieved in some order and these values are separated by a delimiter.

In this article, I will consider a Project Management System (PMS). In PMS, more than one project may be assigned to a user. The users may have different roles. The relationship between user, role and project is stored in a table (UserID, RoleName, ProjectId, RoleAssignedDate). I will generate a report having the list of users with the order in which the roles were assigned to them, excluding the project names, i.e. only user and assigned roles. We will use the power of an XML to get an ordered (latest RoleAssignedDate), distinct (remove duplicate data), delimited (Comma separated value) string in an optimized way.
There are two constraints for generating this report:

  1. Roles should not be repeated in the CSV.
  2. Roles must appear in the order they were assigned.

A script to create a table and insert data is as below:

--Create Table
DECLARE @UserRole TABLE(
UserID INT,
RoleName VARCHAR(100),
ProjectID INT,
RoleAssignedDate DATETIME
)
--Insert data into table
INSERT INTO @UserRole (UserID,[RoleName], ProjectID, RoleAssignedDate )
Select 1112, 'Technical Director', 2041, '1967-02-20 04:21:13.490'
UNION ALL
Select 1357, 'Training', 1614, '1961-09-14 16:18:59.990'
UNION ALL
Select 1836, 'Technical Director', 1628, '1987-07-30 11:22:45.060'
UNION ALL
Select 715, 'Accounting', 1487, '1995-01-08 11:46:17.670'
UNION ALL
Select 162, 'Approver', 1548, '2001-07-02 11:34:14.260'
UNION ALL
Select 1975, 'Technical Director', 1614, '1955-11-10 03:55:05.560'
UNION ALL
Select 1112, 'System Administrator', 831, '1956-04-25 08:26:54.040'
UNION ALL
Select 162, 'Technical Director', 986, '1989-12-17 15:51:04.330'
UNION ALL
Select 715, 'Accounting', 461, '1954-11-27 00:45:52.830'
UNION ALL
Select 1357, 'Developer', 2064, '2006-12-15 23:43:55.470'
UNION ALL
Select 1867, 'Technical Director', 1416, '2003-04-11 16:50:01.070'
UNION ALL
Select 1975, 'Developer', 1548, '1967-05-11 17:01:26.840'
UNION ALL
Select 1975, 'Accounting', 1089, '1988-06-20 00:52:16.070'
UNION ALL
Select 162, 'Marketing', 1443, '1995-06-10 14:29:23.290'
UNION ALL
Select 1112, 'Accounting', 2109, '1999-12-05 09:07:46.620'
UNION ALL
Select 162, 'Technical Director', 1089, '1975-09-25 12:13:12.590'
UNION ALL
Select 162, 'International Sales Manager', 1628, '1984-05-30 09:25:18.330'
UNION ALL
Select 1836, 'Technical Customer', 420, '1993-05-28 00:49:31.090'
UNION ALL
Select 1357, 'Technical Director', 2036, '1979-04-25 23:38:48.120'
UNION ALL
Select 1836, 'Developer', 1628, '2006-07-26 20:36:53.420'

Problem

Let me explain for UserID 162. UserId 162 has Technical Director, Technical Director, Marketing, Approver and International Sales Manager roles for different projects. The role, Technical Director is assigned twice to UserID 162 for two different ProjectIds: 1089 and 1628. In the report, the roles in the order they were assigned to UserID 162 is required, eliminating all duplicate roles against that user. For UserID 162, the order of roles should be: "Technical Director, International Sales Manager, Marketing, Approver"


Figure 1: Order of roles assigned to userId 162

 

Let us generate a comma delimited values with the help of FOR XML PATH. But here each delimited value is built in an alphabetical order of the RoleName. We can see that the order of Roles in the Figure 2 is not as per they were assigned. When the ORDER BY clause is not specified in generating a CSV, by default the order is an alphabetical order of role names.

To generate a CSV string for each user, what I did is, first selected distinct roles appended with a delimiter "," that matches each UserID of main SELECT statement with the SELECT statement of FOR XML PATH. In each CSV, "$" is appended at last so that exactly last comma ",$" is replaced with a blank.

select u1.UserID,
Replace((select distinct u2.RoleName + ',' as 'data()'
from @UserRole u2
where u2.UserID = u1.UserID
FOR XML PATH('')) + '$', ',$', '') as Roles
from @UserRole u1

Figure 2: CSV data in an alphabetical order

 

I will generate the CSV data in the order of the roles that were assigned to each user.

select distinct u1.UserID,
replace((select distinct u2.RoleName + ',' as 'data()'
from @UserRole u2
where u2.UserID = u1.UserID
order by u2.RoleAssignedDate
FOR XML PATH('')) + '$', ',$', '') as Roles
from @UserRole u1

But this gives me an error!!
"Msg 145, Level 15, State 1, Line 54
ORDER BY items must appear in the select list if SELECT DISTINCT is specified."

In the above query, RoleAssignedDate column is not included in the SELECT DISTINCT list, so we get the above error. When DISTINCT is not used, the ORDER BY clause gives no error though ORDER BY fields are not included in the SELECT list.

Solution

To get an ordered, distinct, delimited value, what we can do is first give an identity number for each role related to the UserID, ordering the roles based on the date they were assigned. This can be achieved using ROW_NUMBER() ranking function.

Note: This solution is for SQL Server 2005 and above versions.

select userid, projectid, roleName, roleAssignedDate ,
ROW_NUMBER() over(partition by UserID order by RoleAssignedDate) as RID
from @UserRole
order by userid, rid

 


Figure 3: Assigning an identity for each row partitioned by UserID

ROW_NUMBER() function returns sequential number for each row in a partition of the rows where the UserID is same such that the latest date is first in each group.

After this an XML document having <Roles> and <RID> nodes is created using FOR XML PATH for each UserID. This XML document has nodes in the order RID was generated.

select u1.UserID,
CAST((select distinct u2.RoleName + ',' as Roles,
ROW_NUMBER() over(partition by u2.UserID order by u2.RoleAssignedDate) as RID
from @UserRole u2
where u2.UserID = u1.UserID
order by RID
FOR XML PATH(''), Root('Root')) as XML) as Roles
from @UserRole u1
group by u1.UserID
order by u1.UserID

The XML document for UserId = 162 is

<Root>
<Roles>Technical Director,</Roles>
<RID>1</RID>
<Roles>International Sales Manager,</Roles>
<RID>2</RID>
<Roles>Technical Director,</Roles>
<RID>3</RID>
<Roles>Marketing,</Roles>
<RID>4</RID>
<Roles>Approver,</Roles>
<RID>5</RID>
</Root>

I will use an XQuery to retrieve data from the generated XML documents for each userId. The Query() method is used to read data from <Roles> element. In the above XML document, Role: 'Technical Director,' is repeated for 162 UserId, but have different RID. The report needs distinct role names, for this distinct-values, an XQuery function is used to remove duplicate values from the sequence. There is an extra comma appended after every CSV string, so the last comma is removed using left() string function from each CSV string. The final single T-SQL query is:

select b.UserID 
, LEFT(b.Roles,LEN(b.Roles) - CHARINDEX(',',REVERSE(b.Roles))) as Roles
from
(
select distinct a.UserID
, cast(a.Roles.query('distinct-values(/Root/Roles)') as varchar(max)) as Roles
from
(
select u1.UserID,
CAST((select distinct u2.RoleName + ',' as Roles,
ROW_NUMBER() over(partition by u2.UserID order by u2.RoleAssignedDate) as RID
from @UserRole u2
where u2.UserID = u1.UserID
order by RID
FOR XML PATH(''), Root('Root')) as XML) as Roles
from @UserRole u1
) as a
) as b

Figure 4: Final Output

Summary

ROW_NUMBER() gives a sequence number to each row partitioned by UserId. Then create an XML document containing that sequence number and the column for generating a CSV string. Then with the help of XQuery, retrieve the column data. Using distinct-values function gives unique values in each CSV.

Now you can easily generate an ordered, distinct, delimited string using ROW_NUMBER() and FOR XML PATH. Thanks to XML !!!

If you have any questions related to this article or have any other solutions, please feel free to share.

 

Resources

Rate

4.15 (20)

You rated this post out of 5. Change rating

Share

Share

Rate

4.15 (20)

You rated this post out of 5. Change rating