Group Concatenate Strings?

  • CREATE TABLE #t (a tinyint NOT NULL, b VARCHAR(10), c VARCHAR(10))

    GO

    INSERT #t (a, b,c) VALUES (1, 'ab', 'cd')

    INSERT #t (a, b,c) VALUES (1, 'ef', 'gh')

    INSERT #t (a, b,c) VALUES (2, 'ij', 'kl')

    INSERT #t (a, b,c) VALUES (2, 'mn', 'op')

    I need OUTPUT thusly:

    a concatstringb concatstringc

    1 'abef' 'cdgh'

    2 'ijmn' 'klop'

    being on SQL 2008, CONCAT isn't available (if it helps), although if anyone has a solution using that I would like to see that too

    NULL handling will be required, but I think I can make that do what I need if someone has a solution for the first part

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • It's a horrible three scan plan...but I can't think of another way right now...

    WITH ALL_A(A) AS(SELECT DISTINCT A FROM #T)

    SELECT T.A, CONCATSTRINGB, CONCATSTRINGC

    FROM ALL_A AS T

    CROSS APPLY (

    SELECT ''+TB.B

    FROM #T TB

    WHERE TB.A = T.A

    FOR XML PATH('')

    ) CA1(CONCATSTRINGB)

    CROSS APPLY (

    SELECT ''+TC.C

    FROM #T TC

    WHERE TC.A = T.A

    FOR XML PATH('')

    ) CA2(CONCATSTRINGC)

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Fun with recursion 🙂

    CREATE TABLE #t (a tinyint NOT NULL, b VARCHAR(10), c VARCHAR(10))

    GO

    INSERT #t (a, b,c) VALUES (1, 'ab', 'cd')

    INSERT #t (a, b,c) VALUES (1, 'ef', 'gh')

    INSERT #t (a, b,c) VALUES (2, 'ij', 'kl')

    INSERT #t (a, b,c) VALUES (2, 'mn', 'op')

    WITH QryBase AS (

    SELECT a, b, c, RANK() OVER (PARTITION BY a ORDER BY a, b, c ) AS TheRank

    FROM #t

    ),

    Loopr AS (

    SELECT a, b, c, TheRank

    FROM QryBase

    WHERE TheRank = 1

    UNION ALL

    SELECT l.a, convert(varchar(10), concat(l.b, t.b)),

    convert(varchar(10), CONCAT(l.c, t.c)), IsNull(t.TheRank, l.TheRank)

    FROM QryBase t INNER JOIN Loopr l ON l.a = t.a and t.TheRank = l.TheRank + 1

    ),

    Reranked AS (

    SELECT a, b, c, RANK() OVER(PARTITION BY a ORDER BY a, TheRank DESC) AS Unrank

    FROM Loopr

    )

    SELECT a, b, c FROM Reranked WHERE Unrank = 1

    ORDER BY a;

    Eddie Wuerch
    MCM: SQL

  • Those are both rather gross (especially yours Eddie!!), but they do seem to give the right answer at least. Given I will have N columns and M rows I don't know that either will be as flexible/scalable as I need though.

    Thanks though!! Looking forward to some other approaches. I wonder if the XML trick can be coded to do multiple columns at once ...

    Oh, you both forgot the leading ; before the WITH. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Just found out while doing research that some other RDBMS' have this functionality built in!

    Also, Adam Machanic suggested a CLR UDA, and I found such on codeplex

    https://groupconcat.codeplex.com

    Not sure I can use that in the environment I need it, but it gives me another option.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • mister.magoo (9/19/2016)


    It's a horrible three scan plan...but I can't think of another way right now...

    WITH ALL_A(A) AS(SELECT DISTINCT A FROM #T)

    SELECT T.A, CONCATSTRINGB, CONCATSTRINGC

    FROM ALL_A AS T

    CROSS APPLY (

    SELECT ''+TB.B

    FROM #T TB

    WHERE TB.A = T.A

    FOR XML PATH('')

    ) CA1(CONCATSTRINGB)

    CROSS APPLY (

    SELECT ''+TC.C

    FROM #T TC

    WHERE TC.A = T.A

    FOR XML PATH('')

    ) CA2(CONCATSTRINGC)

    Call it what you will, it's insanely fast, especially in the presence of a Clustered Index. I expanded the problem out to 4 columns to be concatenated just to see what happens. Here's the code to generate the test data and the CI.

    --===== If the test table exists, drop it to make reruns easier.

    IF OBJECT_ID('tempdb..#t','U') IS NOT NULL

    DROP TABLE #t

    ;

    --===== Create and populate test table.

    SELECT TOP 1000000

    a = CAST(ABS(CHECKSUM(NEWID()))%200000+1 AS INT) --roughly 5 rows per "a"

    ,b = CAST(CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) AS VARCHAR(10))

    ,c = CAST(CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) AS VARCHAR(10))

    ,d = CAST(CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) AS VARCHAR(10))

    ,e = CAST(CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) AS VARCHAR(10))

    INTO #t

    FROM sys.all_columns t1,

    sys.all_columns t2

    ;

    --===== Add a useful clustered index (Doubles performance, in this case)

    CREATE CLUSTERED INDEX IX_t_a

    ON #t (a)

    ;

    Here's the expanded version of your code. I reformed it to make it obvious as to what could be built using dynamic SQL.

    SET STATISTICS TIME,IO ON;

    WITH ALL_A(A) AS (SELECT DISTINCT A FROM #T)

    SELECT T.A

    ,ConcatStringB

    ,ConcatStringC

    ,ConcatStringD

    ,ConcatStringE

    FROM ALL_A AS T

    CROSS APPLY (SELECT '' + TB.B FROM #T TB WHERE TB.A = T.A FOR XML PATH('')) CA1 (ConcatStringB)

    CROSS APPLY (SELECT '' + TC.C FROM #T TC WHERE TC.A = T.A FOR XML PATH('')) CA2 (ConcatStringC)

    CROSS APPLY (SELECT '' + TD.D FROM #T TD WHERE TD.A = T.A FOR XML PATH('')) CA3 (ConcatStringD)

    CROSS APPLY (SELECT '' + TE.E FROM #T TE WHERE TE.A = T.A FOR XML PATH('')) CA4 (ConcatStringE)

    SET STATISTICS TIME,IO OFF;

    Here's the output from SET STATISTICs as executed on my little ol' i5 laptop with only 6GB of RAM running at only 2.5GHz. You'll have to run the code to see that the output is correct.

    (198640 row(s) affected)

    Table '#t__________________________________________________________________________________________________________________000000000011'.

    Scan count 794561, logical reads 2409354, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 4181 ms, elapsed time = 4274 ms.

    For that kind of duration (about 1 second per column per million rows), I don't actually care about the scan count or the logical reads and neither will the users. Other Developers will appreciate the symmetry and elegant simplicity of the code.

    I know it's been a while since you've heard it so here goes...

    "Ah, Magoo... you've done it again!' 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I guess we should handle NULLs as requested...

    SET STATISTICS TIME,IO ON;

    WITH ALL_A(A) AS (SELECT DISTINCT A FROM #T)

    SELECT T.A

    ,ConcatStringB

    ,ConcatStringC

    ,ConcatStringD

    ,ConcatStringE

    FROM ALL_A AS T

    CROSS APPLY (SELECT ISNULL(TB.B, '') FROM #T TB WHERE TB.A = T.A FOR XML PATH('')) CA1 (ConcatStringB)

    CROSS APPLY (SELECT ISNULL(TC.C, '') FROM #T TC WHERE TC.A = T.A FOR XML PATH('')) CA2 (ConcatStringC)

    CROSS APPLY (SELECT ISNULL(TD.D, '') FROM #T TD WHERE TD.A = T.A FOR XML PATH('')) CA3 (ConcatStringD)

    CROSS APPLY (SELECT ISNULL(TE.E, '') FROM #T TE WHERE TE.A = T.A FOR XML PATH('')) CA4 (ConcatStringE)

    SET STATISTICS TIME,IO OFF;

    Thanks Jeff, for your kind words and especially for the testing - I tend to pop on here rather late at night and never seem to have the energy for all that. It is much appreciated 🙂

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • I meant to mention that the sample data provided has nothing we can ORDER by to ensure the concatenation happens as expected.

    You should include something for that and add ORDER BY to each CROSS APPLY

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (9/20/2016)


    I guess we should handle NULLs as requested...

    Actually, the FOR XML PATH already handles NULLs, so you don't need to make any changes to your original code.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (9/20/2016)


    mister.magoo (9/20/2016)


    I guess we should handle NULLs as requested...

    Actually, the FOR XML PATH already handles NULLs, so you don't need to make any changes to your original code.

    Drew

    Thanks, I'd never noticed that.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

    Viewing 10 posts - 1 through 9 (of 9 total)

    You must be logged in to reply to this topic. Login to reply