Creating a comma-separated list (SQL Spackle)

  • Matt, Jason - thanks!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Wayne,

    Can you provide more explanation on this part of the query:

    TYPE).value('.','varchar(max)')

    I have always used:

    WITH CTE AS

    (

    SELECT DISTINCT

    AccountNumber

    FROM #TestData

    )

    SELECT AccountNumber,

    CommaList = STUFF((

    SELECT ',' + Value

    FROM #TestData

    WHERE AccountNumber = CTE.AccountNumber

    ORDER BY Value

    FOR XML PATH('')),1,1,'') -- no TYPE.value

    FROM CTE;

    This provides the same results as your query.

  • Tom Bakerman (1/12/2011)


    Wayne,

    Can you provide more explanation on this part of the query:

    TYPE).value('.','varchar(max)')

    Tom,

    Put an Ampersand (&), Less than (<) or Greater than (>) in your data, and see what happens with or without it:

    WITH CTE (Data) AS

    (

    SELECT 'Rolling Stones' UNION ALL

    SELECT 'Hall & Oates' UNION ALL

    SELECT 'One <> Two'

    )

    SELECT [WithType] = STUFF((

    SELECT ',' + Data

    FROM CTE

    FOR XML PATH(''),TYPE).value('.','varchar(max)'),1,1,''),

    [WithOutType] = STUFF((

    SELECT ',' + Data

    FROM CTE

    FOR XML PATH('')),1,1,'');

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (1/12/2011)


    Mark-101232 (1/12/2011)


    Changing

    value('.','varchar(max)'),1,1,'')

    to

    value('(./text())[1]','varchar(max)'),1,1,'')

    appears to give a better query plan

    Thanks for the information... I'll have to test this out.

    Edit: WOW! What a difference! Identical queries, with just that one change. The way in the article gets:

    (1000 row(s) affected)

    Table '#TestData'. Scan count 1001, logical reads 2411, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 234 ms, elapsed time = 3175 ms.

    Your suggested change gets:

    (1000 row(s) affected)

    Table '#TestData'. Scan count 1001, logical reads 2411, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 172 ms, elapsed time = 561 ms.

    Looking at the execution plans (both queries run together), the method in the article uses 99% and your solution 1%.

    Thanks for the tip!

    Very cool! Wayne writes an excellent article, Mark provides an excellent performance enhancement in the discussion, and Wayne tests it for everyone. It doesn't get much better than that. Well done to you both. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/12/2011)


    Very cool! Wayne writes an excellent article, Mark provides an excellent performance enhancement in the discussion, and Wayne tests it for everyone. It doesn't get much better than that. Well done to you both. 🙂

    Thanks Jeff. Now, I just have to get used to this "new" way of doing this!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (1/12/2011)


    Tom,

    Put an Ampersand (&), Less than (<) or Greater than (>) in your data, and see what happens with or without it:

    Got it. Thank you. For our application, having those characters encoded is actually a plus :-D, as all we do is ship the data off to a web app.

    Tom

  • Tom Bakerman (1/12/2011)


    WayneS (1/12/2011)


    Tom,

    Put an Ampersand (&), Less than (<) or Greater than (>) in your data, and see what happens with or without it:

    Got it. Thank you. For our application, having those characters encoded is actually a plus :-D, as all we do is ship the data off to a web app.

    Tom

    Makes sense... now you have two ways to handle things!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • 1) don't put the clustered index on the table and the order by and then run the two versions. Virtually identical timing and resource usage on my mid-grade laptop.

    2) Does NOT having the clustered index on the table open up the potential for getting the wrong output?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (1/12/2011)


    1) don't put the clustered index on the table and the order by and then run the two versions. Virtually identical timing and resource usage on my mid-grade laptop.

    2) Does NOT having the clustered index on the table open up the potential for getting the wrong output?

    Wrong how? The ORDER BY is explicit in the query to generate the xml. If you leave the ORDER BY out of that part of the query. then, as with any other query, you run the risk of seemingly random ordering.

  • WayneS (1/12/2011)


    Jeff Moden (1/12/2011)


    Very cool! Wayne writes an excellent article, Mark provides an excellent performance enhancement in the discussion, and Wayne tests it for everyone. It doesn't get much better than that. Well done to you both. 🙂

    Thanks Jeff. Now, I just have to get used to this "new" way of doing this!

    Perhaps an addition to the article to show Mark's method and the race-code you ran would be in order. I know Steve would go for it. It'll help us all (especially me 'cuz I forget 😉 ).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • WayneS (1/12/2011)


    Mark-101232 (1/12/2011)


    Changing

    value('.','varchar(max)'),1,1,'')

    to

    value('(./text())[1]','varchar(max)'),1,1,'')

    appears to give a better query plan

    Thanks for the information... I'll have to test this out.

    Edit: WOW! What a difference! Identical queries, with just that one change. The way in the article gets:

    (1000 row(s) affected)

    Table '#TestData'. Scan count 1001, logical reads 2411, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 234 ms, elapsed time = 3175 ms.

    Your suggested change gets:

    (1000 row(s) affected)

    Table '#TestData'. Scan count 1001, logical reads 2411, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 172 ms, elapsed time = 561 ms.

    Looking at the execution plans (both queries run together), the method in the article uses 99% and your solution 1%.

    Thanks for the tip!

    FYI: I followed up the testing I did at work with testing at home. The difference at home is not so profound, but Mark's method is still routinely faster, sometimes by 33%. There is a difference in the test code between work and home: the home version is dumping the results into a temp table, to keep everything strictly within SQL and eliminate display (sending output to the display is "the great equalizer"!).

    The test code:

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC FREESYSTEMCACHE ('ALL') WITH NO_INFOMSGS;

    PRINT '--XML TYPE with value(''.'')';

    SET STATISTICS IO,TIME ON;

    GO

    IF OBJECT_ID('tempdb..#test') IS NOT NULL DROP TABLE #test;

    WITH CTE AS

    (

    SELECT DISTINCT

    AccountNumber

    FROM #TestData

    )

    SELECT AccountNumber,

    CommaList = STUFF((SELECT ',' + Value

    FROM #TestData

    WHERE AccountNumber = CTE.AccountNumber

    ORDER BY Value

    FOR XML PATH(''),

    TYPE).value('.','varchar(max)'),1,1,'')

    INTO #test

    FROM CTE

    ORDER BY AccountNumber;

    GO

    SET STATISTICS IO,TIME OFF;

    PRINT '';

    PRINT '';

    PRINT '';

    PRINT '--XML TYPE with value(''(./text())[1]'')';

    SET STATISTICS IO,TIME ON;

    GO

    IF OBJECT_ID('tempdb..#test') IS NOT NULL DROP TABLE #test;

    WITH CTE AS

    (

    SELECT DISTINCT

    AccountNumber

    FROM #TestData

    )

    SELECT AccountNumber,

    CommaList = STUFF((SELECT ',' + Value

    FROM #TestData

    WHERE AccountNumber = CTE.AccountNumber

    ORDER BY Value

    FOR XML PATH(''),

    TYPE).value('(./text())[1]','varchar(max)'),1,1,'')

    INTO #test

    FROM CTE

    ORDER BY AccountNumber;

    SET STATISTICS IO,TIME OFF;

    The results:

    --XML TYPE with value('.')

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 13 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 2 ms.

    Table '#TestData'. Scan count 1001, logical reads 2411, ...

    SQL Server Execution Times:

    CPU time = 140 ms, elapsed time = 213 ms.

    --XML TYPE with value('(./text())[1]')

    SQL Server parse and compile time:

    CPU time = 10 ms, elapsed time = 10 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table '#TestData'. Scan count 1001, logical reads 2411, ...

    SQL Server Execution Times:

    CPU time = 94 ms, elapsed time = 137 ms.

    As I said, not so profound a difference, but yet always faster and it still has a better execution plan!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Is there a way to take these results and post them to a .csv without using powershell? We have been creating out comma delimited files using VB but it would be nice to do it directly from SQL Server. This procedure creates the Comma-Delimited values in a table, it would be nice if I could output them directly from a procedure in SQL Server.

  • Clay Bowen (1/13/2011)


    Is there a way to take these results and post them to a .csv without using powershell? We have been creating out comma delimited files using VB but it would be nice to do it directly from SQL Server. This procedure creates the Comma-Delimited values in a table, it would be nice if I could output them directly from a procedure in SQL Server.

    Use Sqlcmd and redirect the output to a file:

    sqlcmd -S yourserver-d yourdatabase -Q "your query here" > output.csv

  • Hey Wayne,

    The (./text())[1] modification (and variations) has been around for a very long time. I included in the CSV demo in part 2 of my APPLY article in April last year, and I certainly didn't invent it. I have a suspicion that I picked it up from Barry, but I can't be sure.

    Finally, I have a feeling the /text() node has a length limitation...

    Paul

  • I found the alternate solution offered here to be most efficient.

Viewing 15 posts - 16 through 30 (of 84 total)

You must be logged in to reply to this topic. Login to reply