Generating a Distinct Delimited List Using XML

  • Comments posted to this topic are about the item Generating a Distinct Delimited List Using XML

  • Appreciate your article.

    This is with lot of explanation, but we can achieve the output more efficient way with the sorting of rolename also:

    select distinct a.userid, substring(b.rolename, 1, len(b.rolename )-1) from @UserRole a

    cross apply (select distinct RoleName + ', ' from @UserRole where a.userid = userid

    for xml path('')) b (rolename)

  • WITH CTE

    AS

    (

    SELECT ROW_NUMBER() OVER

    (

    PARTITION BY u2.UserID

    ORDER BY MIN(u2.RoleAssignedDate)

    ) AS RowNum,

    u2.UserID,

    u2.RoleName

    FROM @UserRole u2

    GROUP BY u2.UserID,

    u2.RoleName

    )

    SELECT DISTINCT u1.UserID,

    SUBSTRING

    (

    (

    SELECT ', ' + c.RoleName AS '*'

    FROM CTE c

    WHERE u1.UserID = c.UserID

    ORDER BY c.RowNum

    FOR XML PATH('')

    ), 3, 8000

    ) AS Roles

    FROM @UserRole u1;

    Should give the same result without the XQuery.

  • Thanks Kailash Mishra.

    The query that you gave will give result in an alphabetical order instead of the order in which the roles were assigned. The article is about retrieving delimited values in the order they were stored.

    Thanks & regards,

    Puja

  • Thanks Mike for new approach. 🙂

  • I like this article, it was a problem I had to solve previously.

    I particularly like the way you solved the problem of the final ',' at the end of the string. Very neat.

    I have to say I was totally lost by the end of your explanation and found the code too difficult to follow.

    The following is an adaptation of your 2nd stage code, but using Group By RoleName which then allows Order By MIN(RoleAssignedDate) and thereby giving the order required.

    SELECT DISTINCT u1.UserID,

    REPLACE(

    (SELECT u2.RoleName + ',' AS 'data()'

    FROM @UserRole u2

    WHERE u2.UserID = u1.UserID

    GROUP BY RoleName

    ORDER BY MIN(RoleAssignedDate)

    FOR XML PATH('')

    ) + '$', ',$', ''

    ) AS Roles

    FROM @UserRole u1

    Cheers,

    Nick

  • Let's do some testing...SET NOCOUNT ON

    -- Prepare sample data

    DECLARE@UserRole TABLE

    (

    UserID INT NOT NULL,

    RoleName VARCHAR(100) NOT NULL,

    ProjectID INT NOT NULL,

    RoleAssignedDate DATETIME NOT NULL

    )

    -- Populate sample data

    INSERT@UserRole

    (

    UserID,

    RoleName,

    ProjectID,

    RoleAssignedDate

    )

    VALUES(1112, 'Technical Director', 2041, '1967-02-20 04:21:13.490'),

    (1357, 'Training', 1614, '1961-09-14 16:18:59.990'),

    (1836, 'Technical Director', 1628, '1987-07-30 11:22:45.060'),

    ( 715, 'Accounting', 1487, '1995-01-08 11:46:17.670'),

    ( 162, 'Approver', 1548, '2001-07-02 11:34:14.260'),

    (1975, 'Technical Director', 1614, '1955-11-10 03:55:05.560'),

    (1112, 'System Administrator', 831, '1956-04-25 08:26:54.040'),

    ( 162, 'Technical Director', 986, '1989-12-17 15:51:04.330'),

    ( 715, 'Accounting', 461, '1954-11-27 00:45:52.830'),

    (1357, 'Developer', 2064, '2006-12-15 23:43:55.470'),

    (1867, 'Technical Director', 1416, '2003-04-11 16:50:01.070'),

    (1975, 'Developer', 1548, '1967-05-11 17:01:26.840'),

    (1975, 'Accounting', 1089, '1988-06-20 00:52:16.070'),

    ( 162, 'Marketing', 1443, '1995-06-10 14:29:23.290'),

    (1112, 'Accounting', 2109, '1999-12-05 09:07:46.620'),

    ( 162, 'Technical Director', 1089, '1975-09-25 12:13:12.590'),

    ( 162, 'International Sales Manager', 1628, '1984-05-30 09:25:18.330'),

    (1836, 'Technical Customer', 420, '1993-05-28 00:49:31.090'),

    (1357, 'Technical Director', 2036, '1979-04-25 23:38:48.120'),

    (1836, 'Developer', 1628, '2006-07-26 20:36:53.420')

    -- Start timing

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    -- Peso

    PRINT '*** Peso ***********************************************************************'

    SELECTb.UserID,

    STUFF(f.Roles, 1, 1, '') AS Roles

    FROM(

    SELECTUserID

    FROM@UserRole

    GROUP BYUserID

    ) AS b

    CROSS APPLY(

    SELECTTOP(100) PERCENT

    ',' + w.RoleName

    FROM@UserRole AS w

    WHEREw.UserID = b.UserID

    GROUP BYw.RoleName

    ORDER BYMIN(RoleAssignedDate)

    FOR XMLPATH('')

    ) AS f(Roles)

    ORDER BYb.UserID

    -- Puja Shah

    PRINT '*** Puja Shah ******************************************************************'

    SELECTb.UserID,

    LEFT(b.Roles, LEN(b.Roles) - CHARINDEX(',', REVERSE(b.Roles))) AS Roles

    FROM(

    SELECT DISTINCTa.UserID,

    CAST(a.Roles.query('distinct-values(/Root/Roles)') AS VARCHAR(MAX)) AS Roles

    FROM(

    SELECTu1.UserID,

    CAST(

    (

    SELECT DISTINCTu2.RoleName + ',' AS Roles,

    ROW_NUMBER() OVER(PARTITION BY u2.UserID ORDER BY u2.RoleAssignedDate) AS RID

    FROM@UserRole AS u2

    WHEREu2.UserID = u1.UserID

    ORDER BYRID

    FOR XMLPATH(''),

    ROOT('Root')

    ) AS XML

    ) AS Roles

    FROM@UserRole AS u1

    ) AS a

    ) AS b

    ORDER BYb.UserID

    -- Kailash Mishra<- Fails ordering

    PRINT '*** Kailash Mishra *************************************************************'

    SELECT DISTINCTa.UserID,

    SUBSTRING(b.RoleName, 1, LEN(b.RoleName) -1)

    FROM@UserRole AS a

    CROSS APPLY(

    SELECT DISTINCTb.RoleName + ','

    FROM@UserRole AS b

    WHEREb.UserID = a.UserID

    FOR XMLPATH('')

    ) AS b(RoleName)

    ORDER BYa.UserID

    -- Mike C

    PRINT '*** Mike C *********************************************************************'

    ;WITH cte

    AS (

    SELECTROW_NUMBER() OVER (PARTITION BY u2.UserID ORDER BY MIN(u2.RoleAssignedDate)) AS RowNum,

    u2.UserID,

    u2.RoleName

    FROM@UserRole AS u2

    GROUP BYu2.UserID,

    u2.RoleName

    )

    SELECT DISTINCTu1.UserID,

    SUBSTRING(

    (

    SELECT',' + c.RoleName AS '*'

    FROMcte AS c

    WHEREu1.UserID = c.UserID

    ORDER BYc.RowNum

    FOR XMLPATH('')

    ), 2, 8000

    ) AS Roles

    FROM@UserRole AS u1

    ORDER BYu1.UserID

    -- Nick Hanson

    PRINT '*** Nick Hanson ****************************************************************'

    SELECT DISTINCTu1.UserID,

    REPLACE (

    (

    SELECTu2.RoleName + ',' AS 'data()'

    FROM@UserRole AS u2

    WHEREu2.UserID = u1.UserID

    GROUP BYu2.RoleName

    ORDER BYMIN(u2.RoleAssignedDate)

    FOR XMLPATH('')

    ) + '$', ',$', ''

    ) AS Roles

    FROM@UserRole AS u1

    ORDER BYu1.UserID

    -- Stop timing

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF


    N 56°04'39.16"
    E 12°55'05.25"

  • Another variation on the previous suggestions. See what happens to all the other solutions when any of the roles is changed into for example 'R&D'. Plus, I don't like the replace()-approach to get rid of the last ','. I rather don't generate it if it's not needed instead of taking it off in an additional processing step:

    SELECT

    u1.UserID,

    (

    select case row_number() over (order by (select 1)) when 1 then '' else ', ' end

    + t.RoleName AS [text()]

    from (

    select row_number() over (order by min(u2.RoleAssignedDate)) as nr, u2.RoleName

    from @UserRole u2

    where u2.UserID = u1.UserID

    group by u2.RoleName

    ) t

    order by t.nr

    for xml path(''), type

    ).value('.', 'nvarchar(max)') as Roles

    from (

    select

    UserID

    from @UserRole

    group by

    UserID

    ) u1

    order by u1.UserID



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • I recommend using a CLR for this. Microsoft has a great example here http://msdn.microsoft.com/en-us/library/ms131056.aspx. I'm curious what the performance differences are. Admittedly, I have not compared the two because I've never had the CLR function not perform adequately.

    You can then use this as an aggregate, for example.

    SELECT dbo.List(myColumn)

    FROM myTable

    GROUP BY SomeOtherColumn

  • --An alternative without ROW_NUMBER()

    --How about this?

    select distinct

    u.UserID

    , stuff((

    select ', ' + u2.RoleName

    from @UserRole u2

    where u2.UserID = u.UserID

    and u2.RoleAssignedDate = (

    select min(u3.RoleAssignedDate)

    from @UserRole u3

    where u3.UserID = u2.UserID

    and u3.RoleName = u2.RoleName

    )

    order by u2.RoleAssignedDate asc

    for xml path('')

    ), 1, 2, '') 'RoleName'

    from @UserRole u

  • This is a good article thanks!

    I was going to mention that using GROUP BY instead of DISTINCT/ROW_NUMBER might be a good way to solve this, but I see someone has already suggested that. IMO DISTINCT should hardly ever be used, as it usually isn't the best option, like in this case it prevents you from being able to order the results by what you want.

  • Hi Brian, why would you want to eliminate row_number() from the query? Simply copy and paste your code into the test script that SwePeso posted and see for yourself how your version compares to the alternatives already presented. Peso's suggestion has by far the least IO and cpu time; I've taken that and added a fix for a functional problem in all suggestions: the special-characters &, < and > that for xml translates into & amp ;, & lt ; and & gt ; respectively (I put some additional spaces in that aren't in the real codes because the forum software shows the complete codes as &, < and > again). Your example re-introduces that problem and performs much worse because it re-introduces distinct instead of the group by in a sub query. So is there any specific reason why you think your version should be used instead of mine?

    edit: codes were not shown correctly by the forum software.



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • OK, then use GROUP:

    --An alternative without ROW_NUMBER()

    --How about this?

    select

    u.UserID

    , stuff((

    select ', ' + u2.RoleName

    from @UserRole u2

    where u2.UserID = u.UserID

    and u2.RoleAssignedDate = (

    select min(u3.RoleAssignedDate)

    from @UserRole u3

    where u3.UserID = u2.UserID

    and u3.RoleName = u2.RoleName

    )

    order by u2.RoleAssignedDate asc

    for xml path('')

    ), 1, 2, '') 'RoleName'

    from @UserRole u

    group by u.UserID

    *** R.P.Rozema *****************************************************************

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table '#4F1BAE10'. Scan count 8, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 4 ms.

    *** Brian Barkauskas ***********************************************************

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table '#4F1BAE10'. Scan count 8, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 2 ms.

  • Actually for readability, this one's even better,

    It has only 32 "words" compared to your 70,

    182 non-space characters to your 363,

    and runs just as fast:

    select

    u.UserID

    , stuff((

    select ', ' + u2.RoleName

    from @UserRole u2

    where u2.UserID = u.UserID

    group by u2.RoleName

    order by min(u2.RoleAssignedDate)

    for xml path('')

    ),1,2,'') 'RoleName'

    from @UserRole u

    group by u.UserID

  • Brian Barkauskas (6/29/2010)


    OK, then use GROUP:

    --An alternative without ROW_NUMBER()

    --How about this?

    select

    u.UserID

    , stuff((

    select ', ' + u2.RoleName

    from @UserRole u2

    where u2.UserID = u.UserID

    and u2.RoleAssignedDate = (

    select min(u3.RoleAssignedDate)

    from @UserRole u3

    where u3.UserID = u2.UserID

    and u3.RoleName = u2.RoleName

    )

    order by u2.RoleAssignedDate asc

    for xml path('')

    ), 1, 2, '') 'RoleName'

    from @UserRole u

    group by u.UserID

    *** R.P.Rozema *****************************************************************

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table '#4F1BAE10'. Scan count 8, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 4 ms.

    *** Brian Barkauskas ***********************************************************

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table '#4F1BAE10'. Scan count 8, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 2 ms.

    You good folks simply aren't using enough data to claim victory in any of the cases.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply