Concatenating Rows

  • a.everett (10/14/2009)


    so as newbie in the top post quotes why not just use a select @var+ to concatenate.

    You said it thats fine for 1 entity granted can see that.

    But just move the code to a function that accepts the entity pk key and returns the concat string.

    Then just call the function from a select thats listing the entities you want reporting?

    somthing like,

    select entkey, getmylist(entkey) from myentlist

    You certainly can do that and it works quite well, but it is normally more complex and slower than it needs to be. Jeff Moden discussed issues like that extensively in his article at http://www.sqlservercentral.com/articles/Test+Data/61572/

    I believe the best answer I have seen yet was at the bottom of his article, namely

    SELECT t1.SomeID, STUFF((SELECT ','+t2.SomeCode

    FROM dbo.TestData t2

    WHERE t1.SomeID = t2.SomeID FOR XML PATH('')),1,1,'') FROM dbo.TestData t1 GROUP BY t1.SomeID

    I tend to wrap things like that in CTEs for readability purposes if that is part of a larger and more complicated query.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • so...did that hurt you...

  • Good Article.

    Here is an another link with different type of options for row concat.

    http://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/

  • I recently had to concatenate ~1.5 billion rows of text (average length of 110 characters). These were medical notes where each line was a sepearte row. I tested CLR, Stored procedures ... nothing was a fast as the XML method. However I ran into tons of unicode characters that I had to replace. So I had a yucky 18 or so level nested replace. I was able to get the concatenation to work in 2.4 hours creating 34 million documents out of the 1.5 billion. The CLR would have definitly been useful to avoid the unicode character problem, but still I can't get it to beat the xml method.

  • Fantastic! This forum is making my week! Really! I just have one question now; how do you un-concatenate a string into rows efficiently?

  • Joel Mansford (10/14/2009)


    In my opinion the 'neatest' solution is to use a custom CLR aggregate function. Afterall this is an aggregate function. Happily MS provide a concatenate as an example or custom aggregates (first example):

    http://technet.microsoft.com/en-us/library/ms131056.aspx

    I appreciate that for many DBAs rolling out CLR code is a bit of a culture change but it only needs to be done once and the performance seems to be fine.

    It would be interesting to see just how "worth it" a CLR would be in this case especially since the CLR may have to be rolled out to 100 servers. I've found that given just a little time, that most DBA's and SQL Developers who know their trade can come up with a T-SQL Solution that will frequently beat CLR solutions. Of course, there are exceptions... considering the XML solution, I'm just not sure that concatenation is one of them.

    Because many, many T-SQL solutions will frequently beat CLR solutions and, when they don't, they still come close enough to make it NOT worth maintaining separate code in the form of a CLR. I won't even enable CLR capabilites on my home machine. If someone would like to do a test, here's some test code with some of the more "common" solutions that use UDF's and some inline XML code. It's all done in TempDB just to be "safe". Commented-out cleanup code is at the end. Performance information will appear in the "Messages" Tab of SSMS.

    To summarize, each function and snippet of "inline" code concatenate the SomeLetters2 column for distinct values of the SomeID column. Details are in the comments, folks.

    --===== Do the testing in a nice "safe" place

    USE TempDB

    GO

    --===== Setup the display environment for the "Messages" Tab

    SET NOCOUNT OFF --Make sure we can see the rowcounts

    SET STATISTICS TIME OFF --Turned ON and OFF later

    GO

    --===========================================================================================================

    --===== Create and populate a 1,000,000 row test table. This is my standard "million row test table".

    --===========================================================================================================

    -- Column "RowNum" has a range of 1 to 100,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

    -- for all rows.

    -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

    -- Jeff Moden

    PRINT '========== Building the test table and indexes =========='

    GO

    SELECT TOP 1000000

    SomeID = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex12 = RIGHT(NEWID(),12)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1

    CROSS JOIN

    Master.dbo.SysColumns t2

    GO

    --===== Add a Clustered PK

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (SomeID)

    GO

    --===== Add an index to support the concatenation

    CREATE INDEX IX_JBMTest_SomeInt ON dbo.JBMTest (SomeInt,SomeLetters2)

    GO

    --===========================================================================================================

    --===== Create the functions to be tested. Again, make sure you're in TempDB

    --===========================================================================================================

    PRINT '========== Building the functions to test =========='

    --===== Ensure we're still using TempDB

    USE TempDB

    GO

    ----------------------------------------------------------------------------------

    --===== Create a function that uses VARCHAR(8000)

    CREATE FUNCTION dbo.Concat8KTest

    (@SomeInt INT)

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @ReturnString VARCHAR(8000)

    SELECT @ReturnString = ISNULL(@ReturnString+',' ,'') + SomeLetters2

    FROM dbo.JBMTest

    WHERE SomeInt = @SomeInt

    --ORDER BY SomeLetters2 --Uncomment to guarantee ordered output if required

    RETURN @ReturnString

    END

    GO

    ----------------------------------------------------------------------------------

    --===== Create a function that uses VARCHAR(8000) with Coalesce

    CREATE FUNCTION dbo.Concat8KTestCoalesce

    (@SomeInt INT)

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @ReturnString VARCHAR(8000)

    SELECT @ReturnString = COALESCE(@ReturnString+',' ,'') + SomeLetters2

    FROM dbo.JBMTest

    WHERE SomeInt = @SomeInt

    --ORDER BY SomeLetters2 --Uncomment to guarantee ordered output if required

    RETURN @ReturnString

    END

    GO

    ----------------------------------------------------------------------------------

    --===== Create a function that uses VARCHAR(MAX)

    CREATE FUNCTION dbo.ConcatMAXTest

    (@SomeInt INT)

    RETURNS VARCHAR(MAX)

    AS

    BEGIN

    DECLARE @ReturnString VARCHAR(MAX)

    SELECT @ReturnString = ISNULL(@ReturnString+',' ,'') + SomeLetters2

    FROM dbo.JBMTest

    WHERE SomeInt = @SomeInt

    --ORDER BY SomeLetters2 --Uncomment to guarantee ordered output if required

    RETURN @ReturnString

    END

    GO

    ----------------------------------------------------------------------------------

    --===== Create a function that uses VARCHAR(MAX) with Coalesce

    CREATE FUNCTION dbo.ConcatMAXTestCoalesce

    (@SomeInt INT)

    RETURNS VARCHAR(MAX)

    AS

    BEGIN

    DECLARE @ReturnString VARCHAR(MAX)

    SELECT @ReturnString = COALESCE(@ReturnString+',' ,'') + SomeLetters2

    FROM dbo.JBMTest

    WHERE SomeInt = @SomeInt

    --ORDER BY SomeLetters2 --Uncomment to guarantee ordered output if required

    RETURN @ReturnString

    END

    GO

    --===========================================================================================================

    --===== Do the tests which includes each function we created and some "inline" methods using XML.

    --===== Check the "Messages" table for CPU, Duration, and Row Counts.

    --===========================================================================================================

    SET NOCOUNT OFF

    SET STATISTICS TIME OFF

    PRINT '========== 8k ISNULL solution (Concat8KTest) =========='

    SET STATISTICS TIME ON

    SELECT SomeInt,dbo.Concat8KTest(SomeInt)

    FROM dbo.JBMTest

    GROUP BY SomeInt

    ORDER BY SomeInt

    SET STATISTICS TIME OFF

    PRINT '========== MAX ISNULL solution (ConcatMAXTest) =========='

    SET STATISTICS TIME ON

    SELECT SomeInt,dbo.ConcatMAXTest(SomeInt)

    FROM dbo.JBMTest

    GROUP BY SomeInt

    ORDER BY SomeInt

    SET STATISTICS TIME OFF

    PRINT '========== 8k Coalesce solution (Concat8KTestCoalesce) =========='

    SET STATISTICS TIME ON

    SELECT SomeInt,dbo.Concat8KTestCoalesce(SomeInt)

    FROM dbo.JBMTest

    GROUP BY SomeInt

    ORDER BY SomeInt

    SET STATISTICS TIME OFF

    PRINT '========== MAX Coalesce solution (ConcatMAXTestCoalesce) =========='

    SET STATISTICS TIME ON

    SELECT SomeInt,dbo.ConcatMAXTestCoalesce(SomeInt)

    FROM dbo.JBMTest

    GROUP BY SomeInt

    ORDER BY SomeInt

    SET STATISTICS TIME OFF

    PRINT '========== Inline XML solution =========='

    SET STATISTICS TIME ON

    SELECT t1.SomeInt,STUFF((SELECT ',' + t2.SomeLetters2

    FROM dbo.JBMTest t2

    WHERE t2.SomeInt = t1.SomeInt --Correlation here

    --ORDER BY t2.SomeLetters2 --Uncomment to guarantee ordered output if required

    FOR XML PATH(''))

    ,1,1,'') AS SomeLetters2

    FROM dbo.JBMTest t1

    GROUP BY t1.SomeInt --- without GROUP BY multiple rows are returned

    ORDER BY t1.SomeInt

    SET STATISTICS TIME OFF

    PRINT '========== Inline XML solution with TYPE=========='

    SET STATISTICS TIME ON

    SELECT t1.SomeInt,STUFF((SELECT ',' + t2.SomeLetters2

    FROM dbo.JBMTest t2

    WHERE t2.SomeInt = t1.SomeInt --Correlation here

    --ORDER BY t2.SomeLetters2 --Uncomment to guarantee ordered output if required

    FOR XML PATH(''),TYPE).value('.','VARCHAR(8000)')

    ,1,1,'') AS SomeLetters2

    FROM dbo.JBMTest t1

    GROUP BY t1.SomeInt --- without GROUP BY multiple rows are returned

    ORDER BY t1.SomeInt

    SET STATISTICS TIME OFF

    GO

    --===========================================================================================================

    --===== Housekeeping code for when you are done testing.

    --===========================================================================================================

    /*

    USE TempDB; DROP TABLE dbo.JBMTest

    USE TempDB; DROP FUNCTION dbo.Concat8KTest,dbo.Concat8KTestCoalesce,dbo.ConcatMAXTest,dbo.ConcatMAXTestCoalesce

    */

    My machine:

    Single core P4 processor running at 1.8 GHZ (Desktop box)

    Dual IDE 80G hard drives

    1 Gig Ram

    SQL Server 2005 Developer's Edition SP3

    Output from my testing...

    ========== Building the test table and indexes ==========

    (1000000 row(s) affected)

    ========== Building the functions to test ==========

    ========== 8k ISNULL solution (Concat8KTest) ==========

    (50000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 10719 ms, elapsed time = 19704 ms.

    ========== MAX ISNULL solution (ConcatMAXTest) ==========

    (50000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 21906 ms, elapsed time = 30001 ms.

    ========== 8k Coalesce solution (Concat8KTestCoalesce) ==========

    (50000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 11172 ms, elapsed time = 18459 ms.

    ========== MAX Coalesce solution (ConcatMAXTestCoalesce) ==========

    (50000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 24875 ms, elapsed time = 32289 ms.

    ========== Inline XML solution ==========

    (50000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 5610 ms, elapsed time = 7741 ms.

    ========== Inline XML solution with TYPE==========

    (50000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 10515 ms, elapsed time = 12225 ms.

    On my box, the hands down winner is the XML solution with the caveat that it won't handle special characters. The 8K ISNULL function and the XML TYPE solution are just about tied with the caveat that the XML TYPE solution will also handle things bigger than 8k.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • To break apart a concatenated string, please review http://beyondrelational.com/blogs/jacob/archive/2008/08/14/xquery-lab-19-how-to-parse-a-delimited-string.aspx; Jacob's articles have helped me out so much.


    Over 12yrs in IT and 10yrs happily stuck with SQL.
    - SQL 2008/R2/2012/2014/2016/2017
    - Oracle 8/9/10/11
    - MySQL 4/5 and MariaDB

  • pmcpherson (10/14/2009)


    Fantastic! This forum is making my week! Really! I just have one question now; how do you un-concatenate a string into rows efficiently?

    A decent place to start on that is in the following article...

    http://www.sqlservercentral.com/articles/T-SQL/62867/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • pmcpherson (10/14/2009)


    Fantastic! This forum is making my week! Really! I just have one question now; how do you un-concatenate a string into rows efficiently?

    Search this forum (and the web). There have been some marvelous 'split' discussions here on sqlservercentral with lots of sample code and benchmarking.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • HI, sorry if this is wrong place to put this but the example:

    with

    list as (

    select

    name as value,

    row_number() over(order byname)as num

    from

    fruit

    )

    is EXACTLY what I need to do for something. Is this something that can be used in SQL 2000?

  • cmallain (10/14/2009)


    HI, sorry if this is wrong place to put this but the example:

    with

    list as (

    select

    name as value,

    row_number() over(order byname)as num

    from

    fruit

    )

    is EXACTLY what I need to do for something. Is this something that can be used in SQL 2000?

    [Sorry! Meant to comment on this.]

    No, the "with list" syntax is a CTE (Common Table Expression), which is not available in

    sql 2000. Also, I think row_number() over... is also not available in SQL 2000.

  • Seems like a complicated way to reach the end goal.

    I agree with the earlier post.. I'd do it like this:

    declare @Tmp Table (name varchar(10), id int)

    DECLARE @TEXT varchar(Max)

    insert into @Tmp

    select 'Apple' as name, 101 as id union all

    select 'Banana' as name, 102 as id union all

    select 'Orange' as name, 103 as id union all

    select 'Melon' as name, 104 as id union all

    select 'Grape' as name, 105 as id

    select @Text = Coalesce(@text + ', ', '') + name from @Tmp order by Name

    select @text

  • What about if you had rows that you wanted to aggregate like this:

    ID NAME

    101 Apple

    101 Banana

    102 Orange

    102 Melon

    102 Grape

    And wanted a result set like:

    ID NAMES

    101 Apple, Banana

    102 Orange, Melon, Grape

  • Jeff.Mlakar (10/14/2009)


    What about if you had rows that you wanted to aggregate like this:

    ID NAME

    101 Apple

    101 Banana

    102 Orange

    102 Melon

    102 Grape

    And wanted a result set like:

    ID NAMES

    101 Apple, Banana

    102 Orange, Melon, Grape

    There're 6 different ways to do that posted in the test code that I posted above.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • All I needed was to number each row. I figured it out by doing the following:

    select distinct

    ListVal=count(a1.eventid) --listval can be any fieldname that you'd like

    ,a1.eventid

    ,a1.eventdesc

    from MyEventTable a1

    left outer join MyEventTable a2

    on a1.eventid + a1.eventdesc >= a2.eventid + a2.eventdesc --this is the key to make it list each row

    group by a1.eventid

    ,a1.eventdesc

    order by ListVal

    This gives me the EXACT result I was looking for, which was to provide a list value for each row

    ListVal EventIDEventDesc

    1 491Complete Health Risk Assessment

    2 494Alcohol Unit Calculator

    3 495Target Heart Rate Calculator

    4 496Burn Out Assessment

    5 497Stress Quick Test

    6 498Anxiety Assessment

    7 499Depression Assessment

    8 500Type A Assessment

    9 501CPR Certification

    10 502First Aid Certification

Viewing 15 posts - 46 through 60 (of 159 total)

You must be logged in to reply to this topic. Login to reply