Generating a Distinct Delimited List Using XML

  • Puja Shah

    SSCommitted

    Points: 1724

    Comments posted to this topic are about the item Generating a Distinct Delimited List Using XML

  • Kailash Mishra

    SSC Veteran

    Points: 256

    Appreciate your article.

    This is with lot of explanation, but we can achieve the output more efficient way with the sorting of rolename also:

    select distinct a.userid, substring(b.rolename, 1, len(b.rolename )-1) from @UserRole a

    cross apply (select distinct RoleName + ', ' from @UserRole where a.userid = userid

    for xml path('')) b (rolename)

  • Mike C

    SSC-Insane

    Points: 23224

    WITH CTE

    AS

    (

    SELECT ROW_NUMBER() OVER

    (

    PARTITION BY u2.UserID

    ORDER BY MIN(u2.RoleAssignedDate)

    ) AS RowNum,

    u2.UserID,

    u2.RoleName

    FROM @UserRole u2

    GROUP BY u2.UserID,

    u2.RoleName

    )

    SELECT DISTINCT u1.UserID,

    SUBSTRING

    (

    (

    SELECT ', ' + c.RoleName AS '*'

    FROM CTE c

    WHERE u1.UserID = c.UserID

    ORDER BY c.RowNum

    FOR XML PATH('')

    ), 3, 8000

    ) AS Roles

    FROM @UserRole u1;

    Should give the same result without the XQuery.

  • Puja Shah

    SSCommitted

    Points: 1724

    Thanks Kailash Mishra.

    The query that you gave will give result in an alphabetical order instead of the order in which the roles were assigned. The article is about retrieving delimited values in the order they were stored.

    Thanks & regards,

    Puja

  • Puja Shah

    SSCommitted

    Points: 1724

    Thanks Mike for new approach. 🙂

  • nick.hanson

    Grasshopper

    Points: 20

    I like this article, it was a problem I had to solve previously.

    I particularly like the way you solved the problem of the final ',' at the end of the string. Very neat.

    I have to say I was totally lost by the end of your explanation and found the code too difficult to follow.

    The following is an adaptation of your 2nd stage code, but using Group By RoleName which then allows Order By MIN(RoleAssignedDate) and thereby giving the order required.

    SELECT DISTINCT u1.UserID,

    REPLACE (

    (SELECT u2.RoleName + ',' AS 'data()'

    FROM @UserRole u2

    WHERE u2.UserID = u1.UserID

    GROUP BY RoleName

    ORDER BY MIN(RoleAssignedDate)

    FOR XML PATH('')

    ) + '$', ',$', ''

    ) AS Roles

    FROM @UserRole u1

    Cheers,

    Nick

  • SwePeso

    SSC-Dedicated

    Points: 39693

    Let's do some testing...SET NOCOUNT ON

    -- Prepare sample data

    DECLARE @UserRole TABLE

    (

    UserID INT NOT NULL,

    RoleName VARCHAR(100) NOT NULL,

    ProjectID INT NOT NULL,

    RoleAssignedDate DATETIME NOT NULL

    )

    -- Populate sample data

    INSERT @UserRole

    (

    UserID,

    RoleName,

    ProjectID,

    RoleAssignedDate

    )

    VALUES (1112, 'Technical Director', 2041, '1967-02-20 04:21:13.490'),

    (1357, 'Training', 1614, '1961-09-14 16:18:59.990'),

    (1836, 'Technical Director', 1628, '1987-07-30 11:22:45.060'),

    ( 715, 'Accounting', 1487, '1995-01-08 11:46:17.670'),

    ( 162, 'Approver', 1548, '2001-07-02 11:34:14.260'),

    (1975, 'Technical Director', 1614, '1955-11-10 03:55:05.560'),

    (1112, 'System Administrator', 831, '1956-04-25 08:26:54.040'),

    ( 162, 'Technical Director', 986, '1989-12-17 15:51:04.330'),

    ( 715, 'Accounting', 461, '1954-11-27 00:45:52.830'),

    (1357, 'Developer', 2064, '2006-12-15 23:43:55.470'),

    (1867, 'Technical Director', 1416, '2003-04-11 16:50:01.070'),

    (1975, 'Developer', 1548, '1967-05-11 17:01:26.840'),

    (1975, 'Accounting', 1089, '1988-06-20 00:52:16.070'),

    ( 162, 'Marketing', 1443, '1995-06-10 14:29:23.290'),

    (1112, 'Accounting', 2109, '1999-12-05 09:07:46.620'),

    ( 162, 'Technical Director', 1089, '1975-09-25 12:13:12.590'),

    ( 162, 'International Sales Manager', 1628, '1984-05-30 09:25:18.330'),

    (1836, 'Technical Customer', 420, '1993-05-28 00:49:31.090'),

    (1357, 'Technical Director', 2036, '1979-04-25 23:38:48.120'),

    (1836, 'Developer', 1628, '2006-07-26 20:36:53.420')

    -- Start timing

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    -- Peso

    PRINT '*** Peso ***********************************************************************'

    SELECT b.UserID,

    STUFF(f.Roles, 1, 1, '') AS Roles

    FROM (

    SELECT UserID

    FROM @UserRole

    GROUP BY UserID

    ) AS b

    CROSS APPLY (

    SELECT TOP(100) PERCENT

    ',' + w.RoleName

    FROM @UserRole AS w

    WHERE w.UserID = b.UserID

    GROUP BY w.RoleName

    ORDER BY MIN(RoleAssignedDate)

    FOR XML PATH('')

    ) AS f(Roles)

    ORDER BY b.UserID

    -- Puja Shah

    PRINT '*** Puja Shah ******************************************************************'

    SELECT b.UserID,

    LEFT(b.Roles, LEN(b.Roles) - CHARINDEX(',', REVERSE(b.Roles))) AS Roles

    FROM (

    SELECT DISTINCT a.UserID,

    CAST(a.Roles.query('distinct-values(/Root/Roles)') AS VARCHAR(MAX)) AS Roles

    FROM (

    SELECT u1.UserID,

    CAST(

    (

    SELECT DISTINCT u2.RoleName + ',' AS Roles,

    ROW_NUMBER() OVER(PARTITION BY u2.UserID ORDER BY u2.RoleAssignedDate) AS RID

    FROM @UserRole AS u2

    WHERE u2.UserID = u1.UserID

    ORDER BY RID

    FOR XML PATH(''),

    ROOT('Root')

    ) AS XML

    ) AS Roles

    FROM @UserRole AS u1

    ) AS a

    ) AS b

    ORDER BY b.UserID

    -- Kailash Mishra <- Fails ordering

    PRINT '*** Kailash Mishra *************************************************************'

    SELECT DISTINCT a.UserID,

    SUBSTRING(b.RoleName, 1, LEN(b.RoleName) -1)

    FROM @UserRole AS a

    CROSS APPLY (

    SELECT DISTINCT b.RoleName + ','

    FROM @UserRole AS b

    WHERE b.UserID = a.UserID

    FOR XML PATH('')

    ) AS b(RoleName)

    ORDER BY a.UserID

    -- Mike C

    PRINT '*** Mike C *********************************************************************'

    ;WITH cte

    AS (

    SELECT ROW_NUMBER() OVER (PARTITION BY u2.UserID ORDER BY MIN(u2.RoleAssignedDate)) AS RowNum,

    u2.UserID,

    u2.RoleName

    FROM @UserRole AS u2

    GROUP BY u2.UserID,

    u2.RoleName

    )

    SELECT DISTINCT u1.UserID,

    SUBSTRING(

    (

    SELECT ',' + c.RoleName AS '*'

    FROM cte AS c

    WHERE u1.UserID = c.UserID

    ORDER BY c.RowNum

    FOR XML PATH('')

    ), 2, 8000

    ) AS Roles

    FROM @UserRole AS u1

    ORDER BY u1.UserID

    -- Nick Hanson

    PRINT '*** Nick Hanson ****************************************************************'

    SELECT DISTINCT u1.UserID,

    REPLACE (

    (

    SELECT u2.RoleName + ',' AS 'data()'

    FROM @UserRole AS u2

    WHERE u2.UserID = u1.UserID

    GROUP BY u2.RoleName

    ORDER BY MIN(u2.RoleAssignedDate)

    FOR XML PATH('')

    ) + '$', ',$', ''

    ) AS Roles

    FROM @UserRole AS u1

    ORDER BY u1.UserID

    -- Stop timing

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF


    N 56°04'39.16"
    E 12°55'05.25"

  • R.P.Rozema

    SSChampion

    Points: 12300

    Another variation on the previous suggestions. See what happens to all the other solutions when any of the roles is changed into for example 'R&D'. Plus, I don't like the replace()-approach to get rid of the last ','. I rather don't generate it if it's not needed instead of taking it off in an additional processing step:

    SELECT

    u1.UserID,

    (

    select case row_number() over (order by (select 1)) when 1 then '' else ', ' end

    + t.RoleName AS [text()]

    from (

    select row_number() over (order by min(u2.RoleAssignedDate)) as nr, u2.RoleName

    from @UserRole u2

    where u2.UserID = u1.UserID

    group by u2.RoleName

    ) t

    order by t.nr

    for xml path(''), type

    ).value('.', 'nvarchar(max)') as Roles

    from (

    select

    UserID

    from @UserRole

    group by

    UserID

    ) u1

    order by u1.UserID



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • Bradley Deem

    SSCrazy

    Points: 2565

    I recommend using a CLR for this. Microsoft has a great example here http://msdn.microsoft.com/en-us/library/ms131056.aspx. I'm curious what the performance differences are. Admittedly, I have not compared the two because I've never had the CLR function not perform adequately.

    You can then use this as an aggregate, for example.

    SELECT dbo.List(myColumn)

    FROM myTable

    GROUP BY SomeOtherColumn

  • Brian Barkauskas

    Old Hand

    Points: 306

    --An alternative without ROW_NUMBER()

    --How about this?

    select distinct

    u.UserID

    , stuff((

    select ', ' + u2.RoleName

    from @UserRole u2

    where u2.UserID = u.UserID

    and u2.RoleAssignedDate = (

    select min(u3.RoleAssignedDate)

    from @UserRole u3

    where u3.UserID = u2.UserID

    and u3.RoleName = u2.RoleName

    )

    order by u2.RoleAssignedDate asc

    for xml path('')

    ), 1, 2, '') 'RoleName'

    from @UserRole u

  • UMG Developer

    SSChampion

    Points: 13482

    This is a good article thanks!

    I was going to mention that using GROUP BY instead of DISTINCT/ROW_NUMBER might be a good way to solve this, but I see someone has already suggested that. IMO DISTINCT should hardly ever be used, as it usually isn't the best option, like in this case it prevents you from being able to order the results by what you want.

  • R.P.Rozema

    SSChampion

    Points: 12300

    Hi Brian, why would you want to eliminate row_number() from the query? Simply copy and paste your code into the test script that SwePeso posted and see for yourself how your version compares to the alternatives already presented. Peso's suggestion has by far the least IO and cpu time; I've taken that and added a fix for a functional problem in all suggestions: the special-characters &, < and > that for xml translates into & amp ;, & lt ; and & gt ; respectively (I put some additional spaces in that aren't in the real codes because the forum software shows the complete codes as &, < and > again). Your example re-introduces that problem and performs much worse because it re-introduces distinct instead of the group by in a sub query. So is there any specific reason why you think your version should be used instead of mine?

    edit: codes were not shown correctly by the forum software.



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • Brian Barkauskas

    Old Hand

    Points: 306

    OK, then use GROUP:

    --An alternative without ROW_NUMBER()

    --How about this?

    select

    u.UserID

    , stuff((

    select ', ' + u2.RoleName

    from @UserRole u2

    where u2.UserID = u.UserID

    and u2.RoleAssignedDate = (

    select min(u3.RoleAssignedDate)

    from @UserRole u3

    where u3.UserID = u2.UserID

    and u3.RoleName = u2.RoleName

    )

    order by u2.RoleAssignedDate asc

    for xml path('')

    ), 1, 2, '') 'RoleName'

    from @UserRole u

    group by u.UserID

    *** R.P.Rozema *****************************************************************

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table '#4F1BAE10'. Scan count 8, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 4 ms.

    *** Brian Barkauskas ***********************************************************

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table '#4F1BAE10'. Scan count 8, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 2 ms.

  • Brian Barkauskas

    Old Hand

    Points: 306

    Actually for readability, this one's even better,

    It has only 32 "words" compared to your 70,

    182 non-space characters to your 363,

    and runs just as fast:

    select

    u.UserID

    , stuff((

    select ', ' + u2.RoleName

    from @UserRole u2

    where u2.UserID = u.UserID

    group by u2.RoleName

    order by min(u2.RoleAssignedDate)

    for xml path('')

    ),1,2,'') 'RoleName'

    from @UserRole u

    group by u.UserID

  • Jeff Moden

    SSC Guru

    Points: 994266

    Brian Barkauskas (6/29/2010)


    OK, then use GROUP:

    --An alternative without ROW_NUMBER()

    --How about this?

    select

    u.UserID

    , stuff((

    select ', ' + u2.RoleName

    from @UserRole u2

    where u2.UserID = u.UserID

    and u2.RoleAssignedDate = (

    select min(u3.RoleAssignedDate)

    from @UserRole u3

    where u3.UserID = u2.UserID

    and u3.RoleName = u2.RoleName

    )

    order by u2.RoleAssignedDate asc

    for xml path('')

    ), 1, 2, '') 'RoleName'

    from @UserRole u

    group by u.UserID

    *** R.P.Rozema *****************************************************************

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table '#4F1BAE10'. Scan count 8, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 4 ms.

    *** Brian Barkauskas ***********************************************************

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table '#4F1BAE10'. Scan count 8, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 2 ms.

    You good folks simply aren't using enough data to claim victory in any of the cases.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply