How to concatenate group of rows

  • Hi,

    I have a data structure as followed : [Name], [Task] and the data is :

    John, task1

    John, task2

    Joe, task1

    Joe, task3

    Joe, task4

    Jane, task1

    Jane, task4

    ..........

    I wonder how to use the 'SELECT' to make the following result :

    John, task1;task2

    Joe, task1;task3;task4

    Jane, task1;task4

    .....

    Thanks for any help.

  • select

    Name, stuff((select ';' + Task from Allocation where Name = a.Name for xml path('') order by Task),1,1,'') [Tasks]

    from

    Allocation a

    group by Name

    Wrote on notepad, may have syntax error.

    https://sqlroadie.com/

  • Dear Arjun,

    Thanks for guidance. With some minor modification, it works perfectly.

    Now, I have another issue, in my [Task] field I have subgroups that is indicated with a specific format like :

    [Name] , [Task]

    ------------------

    John , task1:a

    John, task2:a

    John, task3

    John, task4:a

    ........

    What I need is when I concatenate my values, I only need to include those tasks that have subgroup a (:a) and omit those without subgroup (:a). so I need :

    [Name] , [Task]

    ------------------

    John , task1:a;task2:a;task4:a

    ................

    How could I achieve that ?

    Thanks again

  • append charindex(':',task) > 0 to the query?

    Not sure of the requirement. If you want to filter only subgroup 'a' and not 'b' or 'c' you may want to check out splitter functions. Just search in SSC. Basically, it's a function which you can use like

    isRightHalf(task,':') = 'a'

    https://sqlroadie.com/

  • RZ52 (2/27/2013)


    Hi,

    I have a data structure as followed : [Name], [Task] and the data is :

    John, task1

    John, task2

    Joe, task1

    Joe, task3

    Joe, task4

    Jane, task1

    Jane, task4

    ..........

    I wonder how to use the 'SELECT' to make the following result :

    John, task1;task2

    Joe, task1;task3;task4

    Jane, task1;task4

    .....

    Thanks for any help.

    Since that represents denormalization almost as bad as XML, I have to ask, why do you want to do this to your data? What is the business requirement?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Dear Jeff,

    The above scenario is only a simplified example. My real dataset is about the serotype profile of certain bacterial samples. So, we do some probe tests for certain factors in triplicate manner and the data structure is :

    [Sample_ID], [Rep_ID], [Result]

    ----------------------------

    1, 1, O152

    1, 2, O2

    1, 3, O157

    2, 1, O5

    2, 2, Negative

    2, 3, O1

    3, 1, O157

    3, 2, O1

    3, 3, O1

    4, 1, O157

    4, 2, Negative

    4, 3, O2

    5, 1, Negative

    5, 2, Negative

    5, 3, Negative

    ...........

    Now the I need a profile column in result set with the following conditions :

    1) The O15? are the dangerous pathogens, so we only need to profile these factors. Those samples with no O15? factor will be tagged as "Non-pathogen"

    2) We don't want to show "Negative" as part of profile except when all three replicates are "Negative" which will be assigned as "Negative"

    So the final result needs to be like :

    [Sample ID], [Profile]

    --------------------------------------------

    1, O152;O157

    2, Non-pathogen

    3, O157

    4, O157

    5, Negative

    ..................

    I hope I could explain my question.

    So, if you have any suggestion or help, I really appreciate

  • So, to make sure I'm clear with what you want.

    With this sample data: -

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment;

    END;

    SELECT [Sample_ID], [Rep_ID], [Result]

    INTO #testEnvironment

    FROM (VALUES(1, 1, 'O152'),(1, 2, 'O2'),(1, 3, 'O157'),(2, 1, 'O5'),

    (2, 2, 'Negative'),(2, 3, 'O1'),(3, 1, 'O157'),(3, 2, 'O1'),

    (3, 3, 'O1'),(4, 1, 'O157'),(4, 2, 'Negative'),(4, 3, 'O2'),

    (5, 1, 'Negative'),(5, 2, 'Negative'),(5, 3, 'Negative')

    )a([Sample_ID], [Rep_ID], [Result]);

    Which produces: -

    Sample_ID Rep_ID Result

    ----------- ----------- --------

    1 1 O152

    1 2 O2

    1 3 O157

    2 1 O5

    2 2 Negative

    2 3 O1

    3 1 O157

    3 2 O1

    3 3 O1

    4 1 O157

    4 2 Negative

    4 3 O2

    5 1 Negative

    5 2 Negative

    5 3 Negative

    You'd like a result that looks like this: -

    Sample_ID Profile

    ----------- ----------

    1 O152;O157

    2 Negative

    3 O157

    4 O157

    5 Negative

    ??

    If I'm correct in my assumptions above, then using the sample data that I defined above you could achieve the result like this: -

    SELECT *

    FROM (SELECT DISTINCT [Sample_ID]

    FROM #testEnvironment

    )a

    CROSS APPLY (SELECT ISNULL(STUFF((SELECT ';'+[Result]

    FROM #testEnvironment

    WHERE a.Sample_ID = Sample_ID

    AND CHARINDEX('O15',[Result]) > 0

    FOR XML PATH(''), TYPE

    ).value('.','NVARCHAR(MAX)'),1,1,'')

    ,'Negative')

    )b([Profile]);


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Dear Dadavre,

    The proposed solution works good. However, we need also to distinguish those samples that are positive but no O15? factor was detected and tag them as "Non-pathogen" (in my dataset "Sample 2") and current command return it also as "Negative". How can I overcome this issue ?

    Thanks again for your help

  • RZ52 (2/28/2013)


    Dear Dadavre,

    The proposed solution works good. However, we need also to distinguish those samples that are positive but no O15? factor was detected and tag them as "Non-pathogen" (in my dataset "Sample 2") and current command return it also as "Negative". How can I overcome this issue ?

    Thanks again for your help

    Sorry, I missed "Non-pathogen". Does "Non-pathogen" only come up if all of the results are "Negative" ?

    If so: -

    SELECT [Sample_ID], ISNULL([Interim],[Profile]) AS [Profile]

    FROM (SELECT [Sample_ID],

    CASE WHEN MAX([Result]) = MIN([Result]) AND MAX([Result]) = 'Negative' THEN 'Non-pathogen' ELSE NULL END

    FROM #testEnvironment

    GROUP BY [Sample_ID]

    )a([Sample_ID],[Interim])

    CROSS APPLY (SELECT ISNULL(STUFF((SELECT ';'+[Result]

    FROM #testEnvironment

    WHERE a.Sample_ID = Sample_ID

    AND CHARINDEX('O15',[Result]) > 0

    FOR XML PATH(''), TYPE

    ).value('.','NVARCHAR(MAX)'),1,1,'')

    ,'Negative')

    )b([Profile]);

    Produces: -

    Sample_ID Profile

    ----------- ------------

    1 O152;O157

    2 Negative

    3 O157

    4 O157

    5 Non-pathogen

    Re-read what you wrote, I got it the wrong way around.

    Try this: -

    SELECT [Sample_ID], ISNULL([Interim],[Profile]) AS [Profile]

    FROM (SELECT [Sample_ID],

    CAST(CASE WHEN MAX([Result]) = MIN([Result]) AND MAX([Result]) = 'Negative'

    THEN 'Negative'

    ELSE NULL END AS NVARCHAR(MAX))

    FROM #testEnvironment

    GROUP BY [Sample_ID]

    )a([Sample_ID],[Interim])

    CROSS APPLY (SELECT ISNULL(STUFF((SELECT ';'+[Result]

    FROM #testEnvironment

    WHERE a.Sample_ID = Sample_ID

    AND CHARINDEX('O15',[Result]) > 0

    FOR XML PATH(''), TYPE

    ).value('.','NVARCHAR(MAX)'),1,1,'')

    ,'Non-pathogen')

    )b([Profile]);

    Produces: -

    Sample_ID Profile

    ----------- --------------

    1 O152;O157

    2 Non-pathogen

    3 O157

    4 O157

    5 Negative


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • And now we know why I asked. 🙂 There's always more behind these types of requests. Thanks you very much for taking the time to post the business reason (filtered string aggregation to make human review/usage simpler is how I'm looking at it).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Dear Jeff,

    Thanks for your advice. I am newbie so it might take time to learn how I need to present my situation but I always appreciate suggestions from seniors.

    Best Regards

  • Dear Cadavre,

    It worked nice and perfect.

    Sorry if I was not very clear on explaining situation.

    Best Regards,

  • Cadavre (2/28/2013)


    So, to make sure I'm clear with what you want.

    With this sample data: -

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment;

    END;

    SELECT [Sample_ID], [Rep_ID], [Result]

    INTO #testEnvironment

    FROM (VALUES(1, 1, 'O152'),(1, 2, 'O2'),(1, 3, 'O157'),(2, 1, 'O5'),

    (2, 2, 'Negative'),(2, 3, 'O1'),(3, 1, 'O157'),(3, 2, 'O1'),

    (3, 3, 'O1'),(4, 1, 'O157'),(4, 2, 'Negative'),(4, 3, 'O2'),

    (5, 1, 'Negative'),(5, 2, 'Negative'),(5, 3, 'Negative')

    )a([Sample_ID], [Rep_ID], [Result]);

    Which produces: -

    Sample_ID Rep_ID Result

    ----------- ----------- --------

    1 1 O152

    1 2 O2

    1 3 O157

    2 1 O5

    2 2 Negative

    2 3 O1

    3 1 O157

    3 2 O1

    3 3 O1

    4 1 O157

    4 2 Negative

    4 3 O2

    5 1 Negative

    5 2 Negative

    5 3 Negative

    You'd like a result that looks like this: -

    Sample_ID Profile

    ----------- ----------

    1 O152;O157

    2 Negative

    3 O157

    4 O157

    5 Negative

    ??

    If I'm correct in my assumptions above, then using the sample data that I defined above you could achieve the result like this: -

    SELECT *

    FROM (SELECT DISTINCT [Sample_ID]

    FROM #testEnvironment

    )a

    CROSS APPLY (SELECT ISNULL(STUFF((SELECT ';'+[Result]

    FROM #testEnvironment

    WHERE a.Sample_ID = Sample_ID

    AND CHARINDEX('O15',[Result]) > 0

    FOR XML PATH(''), TYPE

    ).value('.','NVARCHAR(MAX)'),1,1,'')

    ,'Negative')

    )b([Profile]);

    Dear Cadavre,

    I am testing your solution and something came up that I can't solve. If I have two same result for a sample like :

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment;

    END;

    SELECT [Sample_ID], [Rep_ID], [Result]

    INTO #testEnvironment

    FROM (VALUES(1, 1, 'O152'),(1, 2, 'O2'),(1, 3, 'O157'),(1, 4, 'O157'),(2, 1, 'O5'),

    (2, 2, 'Negative'),(2, 3, 'O1'),(3, 1, 'O157'),(3, 2, 'O1'),

    (3, 3, 'O1'),(4, 1, 'O157'),(4, 2, 'Negative'),(4, 3, 'O2'),

    (5, 1, 'Negative'),(5, 2, 'Negative'),(5, 3, 'Negative')

    )a([Sample_ID], [Rep_ID], [Result]);

    running your proposed code :

    SELECT *

    FROM (SELECT DISTINCT [Sample_ID]

    FROM #testEnvironment

    )a

    CROSS APPLY (SELECT ISNULL(STUFF((SELECT ';'+[Result]

    FROM #testEnvironment

    WHERE a.Sample_ID = Sample_ID

    AND CHARINDEX('O15',[Result]) > 0

    FOR XML PATH(''), TYPE

    ).value('.','NVARCHAR(MAX)'),1,1,'')

    ,'Negative')

    )b([Profile]);

    will return :

    Sample_ID Profile

    ----------- ----------

    1 O152;O157;O157

    2 Negative

    3 O157

    4 O157

    5 Negative

    How can I avoid the repeating item and have :

    Sample_ID Profile

    ----------- ----------

    1 O152;O157

    2 Negative

    3 O157

    4 O157

    5 Negative

    Thanks in advance for help

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply