Generating a Distinct Delimited List Using XML

Question

Post reply

Generating a Distinct Delimited List Using XML

Puja Shah

SSCommitted

Points: 1724
More actions
June 28, 2010 at 9:35 pm

#223342

Comments posted to this topic are about the item Generating a Distinct Delimited List Using XML

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply

Kailash Mishra SSC Veteran Points: 286 More actions · Answer 1

Appreciate your article.

This is with lot of explanation, but we can achieve the output more efficient way with the sorting of rolename also:

select distinct a.userid, substring(b.rolename, 1, len(b.rolename )-1) from @UserRole a

cross apply (select distinct RoleName + ', ' from @UserRole where a.userid = userid

for xml path('')) b (rolename)

Mike C SSC-Insane Points: 23224 More actions · Answer 2

WITH CTE

AS

(

SELECT ROW_NUMBER() OVER

(

PARTITION BY u2.UserID

ORDER BY MIN(u2.RoleAssignedDate)

) AS RowNum,

u2.UserID,

u2.RoleName

FROM @UserRole u2

GROUP BY u2.UserID,

u2.RoleName

)

SELECT DISTINCT u1.UserID,

SUBSTRING

(

SELECT ', ' + c.RoleName AS '*'

FROM CTE c

WHERE u1.UserID = c.UserID

ORDER BY c.RowNum

FOR XML PATH('')

), 3, 8000

) AS Roles

FROM @UserRole u1;

Should give the same result without the XQuery.

Puja Shah SSCommitted Points: 1724 More actions · Answer 3

Thanks Kailash Mishra.

The query that you gave will give result in an alphabetical order instead of the order in which the roles were assigned. The article is about retrieving delimited values in the order they were stored.

Thanks & regards,

Puja

Puja Shah SSCommitted Points: 1724 More actions · Answer 4

Puja Shah

SSCommitted

Points: 1724

June 29, 2010 at 12:00 am

#1186168

Thanks Mike for new approach. 🙂

nick.hanson Grasshopper Points: 20 More actions · Answer 5

I like this article, it was a problem I had to solve previously.

I particularly like the way you solved the problem of the final ',' at the end of the string. Very neat.

I have to say I was totally lost by the end of your explanation and found the code too difficult to follow.

The following is an adaptation of your 2nd stage code, but using Group By RoleName which then allows Order By MIN(RoleAssignedDate) and thereby giving the order required.

SELECT DISTINCT u1.UserID,

REPLACE(

(SELECT u2.RoleName + ',' AS 'data()'

FROM @UserRole u2

WHERE u2.UserID = u1.UserID

GROUP BY RoleName

ORDER BY MIN(RoleAssignedDate)

FOR XML PATH('')

) + '$', ',$', ''

) AS Roles

FROM @UserRole u1

Cheers,

Nick

SwePeso SSC-Dedicated Points: 39758 More actions · Answer 6

Let's do some testing...SET NOCOUNT ON

-- Prepare sample data

DECLARE@UserRole TABLE

(

UserID INT NOT NULL,

RoleName VARCHAR(100) NOT NULL,

ProjectID INT NOT NULL,

RoleAssignedDate DATETIME NOT NULL

)

-- Populate sample data

INSERT@UserRole

(

UserID,

RoleName,

ProjectID,

RoleAssignedDate

)

VALUES(1112, 'Technical Director', 2041, '1967-02-20 04:21:13.490'),

(1357, 'Training', 1614, '1961-09-14 16:18:59.990'),

(1836, 'Technical Director', 1628, '1987-07-30 11:22:45.060'),

( 715, 'Accounting', 1487, '1995-01-08 11:46:17.670'),

( 162, 'Approver', 1548, '2001-07-02 11:34:14.260'),

(1975, 'Technical Director', 1614, '1955-11-10 03:55:05.560'),

(1112, 'System Administrator', 831, '1956-04-25 08:26:54.040'),

( 162, 'Technical Director', 986, '1989-12-17 15:51:04.330'),

( 715, 'Accounting', 461, '1954-11-27 00:45:52.830'),

(1357, 'Developer', 2064, '2006-12-15 23:43:55.470'),

(1867, 'Technical Director', 1416, '2003-04-11 16:50:01.070'),

(1975, 'Developer', 1548, '1967-05-11 17:01:26.840'),

(1975, 'Accounting', 1089, '1988-06-20 00:52:16.070'),

( 162, 'Marketing', 1443, '1995-06-10 14:29:23.290'),

(1112, 'Accounting', 2109, '1999-12-05 09:07:46.620'),

( 162, 'Technical Director', 1089, '1975-09-25 12:13:12.590'),

( 162, 'International Sales Manager', 1628, '1984-05-30 09:25:18.330'),

(1836, 'Technical Customer', 420, '1993-05-28 00:49:31.090'),

(1357, 'Technical Director', 2036, '1979-04-25 23:38:48.120'),

(1836, 'Developer', 1628, '2006-07-26 20:36:53.420')

-- Start timing

SET STATISTICS IO ON

SET STATISTICS TIME ON

-- Peso

PRINT '*** Peso ***********************************************************************'

SELECTb.UserID,

STUFF(f.Roles, 1, 1, '') AS Roles

FROM(

SELECTUserID

FROM@UserRole

GROUP BYUserID

) AS b

CROSS APPLY(

SELECTTOP(100) PERCENT

',' + w.RoleName

FROM@UserRole AS w

WHEREw.UserID = b.UserID

GROUP BYw.RoleName

ORDER BYMIN(RoleAssignedDate)

FOR XMLPATH('')

) AS f(Roles)

ORDER BYb.UserID

-- Puja Shah

PRINT '*** Puja Shah ******************************************************************'

SELECTb.UserID,

LEFT(b.Roles, LEN(b.Roles) - CHARINDEX(',', REVERSE(b.Roles))) AS Roles

FROM(

SELECT DISTINCTa.UserID,

CAST(a.Roles.query('distinct-values(/Root/Roles)') AS VARCHAR(MAX)) AS Roles

FROM(

SELECTu1.UserID,

CAST(

(

SELECT DISTINCTu2.RoleName + ',' AS Roles,

ROW_NUMBER() OVER(PARTITION BY u2.UserID ORDER BY u2.RoleAssignedDate) AS RID

FROM@UserRole AS u2

WHEREu2.UserID = u1.UserID

ORDER BYRID

FOR XMLPATH(''),

ROOT('Root')

) AS XML

) AS Roles

FROM@UserRole AS u1

) AS a

) AS b

ORDER BYb.UserID

-- Kailash Mishra<- Fails ordering

PRINT '*** Kailash Mishra *************************************************************'

SELECT DISTINCTa.UserID,

SUBSTRING(b.RoleName, 1, LEN(b.RoleName) -1)

FROM@UserRole AS a

CROSS APPLY(

SELECT DISTINCTb.RoleName + ','

FROM@UserRole AS b

WHEREb.UserID = a.UserID

FOR XMLPATH('')

) AS b(RoleName)

ORDER BYa.UserID

-- Mike C

PRINT '*** Mike C *********************************************************************'

;WITH cte

AS (

SELECTROW_NUMBER() OVER (PARTITION BY u2.UserID ORDER BY MIN(u2.RoleAssignedDate)) AS RowNum,

u2.UserID,

u2.RoleName

FROM@UserRole AS u2

GROUP BYu2.UserID,

u2.RoleName

)

SELECT DISTINCTu1.UserID,

SUBSTRING(

(

SELECT',' + c.RoleName AS '*'

FROMcte AS c

WHEREu1.UserID = c.UserID

ORDER BYc.RowNum

FOR XMLPATH('')

), 2, 8000

) AS Roles

FROM@UserRole AS u1

ORDER BYu1.UserID

-- Nick Hanson

PRINT '*** Nick Hanson ****************************************************************'

SELECT DISTINCTu1.UserID,

REPLACE (

(

SELECTu2.RoleName + ',' AS 'data()'

FROM@UserRole AS u2

WHEREu2.UserID = u1.UserID

GROUP BYu2.RoleName

ORDER BYMIN(u2.RoleAssignedDate)

FOR XMLPATH('')

) + '$', ',$', ''

) AS Roles

FROM@UserRole AS u1

ORDER BYu1.UserID

-- Stop timing

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

N 56°04'39.16"
E 12°55'05.25"

R.P.Rozema SSChampion Points: 12317 More actions · Answer 7

Another variation on the previous suggestions. See what happens to all the other solutions when any of the roles is changed into for example 'R&D'. Plus, I don't like the replace()-approach to get rid of the last ','. I rather don't generate it if it's not needed instead of taking it off in an additional processing step:

SELECT

u1.UserID,

(

select case row_number() over (order by (select 1)) when 1 then '' else ', ' end

+ t.RoleName AS [text()]

from (

select row_number() over (order by min(u2.RoleAssignedDate)) as nr, u2.RoleName

from @UserRole u2

where u2.UserID = u1.UserID

group by u2.RoleName

) t

order by t.nr

for xml path(''), type

).value('.', 'nvarchar(max)') as Roles

from (

select

UserID

from @UserRole

group by

UserID

) u1

order by u1.UserID

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

Bradley Deem SSCrazy Points: 2565 More actions · Answer 8

I recommend using a CLR for this. Microsoft has a great example here http://msdn.microsoft.com/en-us/library/ms131056.aspx. I'm curious what the performance differences are. Admittedly, I have not compared the two because I've never had the CLR function not perform adequately.

You can then use this as an aggregate, for example.

SELECT dbo.List(myColumn)

FROM myTable

GROUP BY SomeOtherColumn

Brian Barkauskas Old Hand Points: 306 More actions · Answer 9

--An alternative without ROW_NUMBER()

--How about this?

select distinct

u.UserID

, stuff((

select ', ' + u2.RoleName

from @UserRole u2

where u2.UserID = u.UserID

and u2.RoleAssignedDate = (

select min(u3.RoleAssignedDate)

from @UserRole u3

where u3.UserID = u2.UserID

and u3.RoleName = u2.RoleName

)

order by u2.RoleAssignedDate asc

for xml path('')

), 1, 2, '') 'RoleName'

from @UserRole u

UMG Developer SSChampion Points: 13482 More actions · Answer 10

This is a good article thanks!

I was going to mention that using GROUP BY instead of DISTINCT/ROW_NUMBER might be a good way to solve this, but I see someone has already suggested that. IMO DISTINCT should hardly ever be used, as it usually isn't the best option, like in this case it prevents you from being able to order the results by what you want.

R.P.Rozema SSChampion Points: 12317 More actions · Answer 11

Hi Brian, why would you want to eliminate row_number() from the query? Simply copy and paste your code into the test script that SwePeso posted and see for yourself how your version compares to the alternatives already presented. Peso's suggestion has by far the least IO and cpu time; I've taken that and added a fix for a functional problem in all suggestions: the special-characters &, < and > that for xml translates into & amp ;, & lt ; and & gt ; respectively (I put some additional spaces in that aren't in the real codes because the forum software shows the complete codes as &, < and > again). Your example re-introduces that problem and performs much worse because it re-introduces distinct instead of the group by in a sub query. So is there any specific reason why you think your version should be used instead of mine?

edit: codes were not shown correctly by the forum software.

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

Brian Barkauskas Old Hand Points: 306 More actions · Answer 12

OK, then use GROUP:

--An alternative without ROW_NUMBER()

--How about this?

select

u.UserID

, stuff((

select ', ' + u2.RoleName

from @UserRole u2

where u2.UserID = u.UserID

and u2.RoleAssignedDate = (

select min(u3.RoleAssignedDate)

from @UserRole u3

where u3.UserID = u2.UserID

and u3.RoleName = u2.RoleName

)

order by u2.RoleAssignedDate asc

for xml path('')

), 1, 2, '') 'RoleName'

from @UserRole u

group by u.UserID

*** R.P.Rozema *****************************************************************

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

Table '#4F1BAE10'. Scan count 8, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 4 ms.

*** Brian Barkauskas ***********************************************************

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

Table '#4F1BAE10'. Scan count 8, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 2 ms.

Brian Barkauskas Old Hand Points: 306 More actions · Answer 13

Actually for readability, this one's even better,

It has only 32 "words" compared to your 70,

182 non-space characters to your 363,

and runs just as fast:

select

u.UserID

, stuff((

select ', ' + u2.RoleName

from @UserRole u2

where u2.UserID = u.UserID

group by u2.RoleName

order by min(u2.RoleAssignedDate)

for xml path('')

),1,2,'') 'RoleName'

from @UserRole u

group by u.UserID

Jeff Moden SSC Guru Points: 1004747 More actions · Answer 14

Brian Barkauskas (6/29/2010)
OK, then use GROUP:
--An alternative without ROW_NUMBER()
--How about this?
select
u.UserID
, stuff((
select ', ' + u2.RoleName
from @UserRole u2
where u2.UserID = u.UserID
and u2.RoleAssignedDate = (
select min(u3.RoleAssignedDate)
from @UserRole u3
where u3.UserID = u2.UserID
and u3.RoleName = u2.RoleName
)
order by u2.RoleAssignedDate asc
for xml path('')
), 1, 2, '') 'RoleName'
from @UserRole u
group by u.UserID
*** R.P.Rozema *****************************************************************
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
Table '#4F1BAE10'. Scan count 8, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 4 ms.
*** Brian Barkauskas ***********************************************************
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
Table '#4F1BAE10'. Scan count 8, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.

You good folks simply aren't using enough data to claim victory in any of the cases.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)