Help with "Reverse Tally" or Creating a Comma Seperated list of values

  • Say you have a table with time oriented data (columns would be time period, and value), and you want to put the values into a comma seperated list, such as for charting sales over the past year. Easy enough to do with FOR XML RAW.

    BUT:

    (Using sales as the on-going example) You have to do this for a large number of products (10's of thousands).

    After reading the article about using Tally tables, I thought that might be a tool to use. But as I wrote this description, maybe not. Here's the current code. How do I get rid of the cursor? Please note that my predecessor had written this using two cursors, the inner one stepping through the previous years dates and appending the monthly values 1 at a time, so this is a bit of an improvement. But, in our database this takes about 1/2 hour to run for a single chart, and there are 13 different charts (and the SELECT statement at the end is really an UPDATE statement, hence the need for the identifier to join back to the table to be updated)

    SQL2005 solutions desired (moving to 2008 soonish)

    CREATE TABLE [dbo].[TestTable](

    [Area] [nvarchar](10) NOT NULL,

    [TimePeriod] [datetime] NOT NULL,

    [SomeValue] [int] NOT NULL

    )

    GO

    insert testtable (area, timeperiod, somevalue)

    values ('1', '2009-02-01', 1)

    insert testtable (area, timeperiod, somevalue)

    values ('1', '2009-03-01', 10)

    insert testtable (area, timeperiod, somevalue)

    values ('1', '2009-04-01', 11)

    insert testtable (area, timeperiod, somevalue)

    values ('2', '2009-01-01', 12)

    insert testtable (area, timeperiod, somevalue)

    values ('2', '2009-02-01', 21)

    insert testtable (area, timeperiod, somevalue)

    values ('2', '2009-03-01', 22)

    insert testtable (area, timeperiod, somevalue)

    values ('2', '2009-04-01', 23)

    insert testtable (area, timeperiod, somevalue)

    values ('3', '2009-01-01', 34)

    insert testtable (area, timeperiod, somevalue)

    values ('3', '2009-02-01', 35)

    insert testtable (area, timeperiod, somevalue)

    values ('3', '2009-03-01', 31)

    insert testtable (area, timeperiod, somevalue)

    values ('3', '2009-04-01', 32)

    GO

    DECLARE @area nvarchar(10)

    DECLARE csv_CURSOR CURSOR FOR

    SELECT DISTINCT area

    FROM TestTable

    OPEN csv_CURSOR

    FETCH NEXT FROM csv_CURSOR INTO

    @area

    WHILE @@FETCH_STATUS = 0

    BEGIN

    WITH cte ([data()]) AS (

    SELECT chartValue + N', ' AS ColumnToList

    FROM (SELECT TOP 100 PERCENT area, CAST(SomeValue AS NVARCHAR(20)) AS chartValue

    FROM TestTable

    WHERE TimePeriod >= DATEADD(mm, -12, (SELECT MAX(TimePeriod) FROM TestTable))

    AND Area = @area) qry

    ORDER BY area

    FOR XML PATH(''), TYPE

    )

    , csvRAW (Area, csv) AS (

    SELECT @area Area,

    (SELECT [data()] AS mydata FROM cte AS d FOR XML RAW, TYPE).value('/row[1]/mydata[1]', 'NVARCHAR(max)') AS CSV

    )

    SELECT area, SUBSTRING(csv, 1, LEN(csv)-1) AS ChartValue

    FROM csvRAW

    FETCH NEXT FROM csv_CURSOR INTO

    @area

    END

    CLOSE csv_CURSOR

    DEALLOCATE csv_CURSOR

  • For the given test data (thanks for that!), what should the output look like?

    Also, I can't get your cursor/cte to run.

    Edit: got the test code to run, I see what your results should be.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • The expected results are from 3 executions of the final SELECT statement in the loop:

    areaChartValue

    11, 10, 11

    areaChartValue

    212, 21, 22, 23

    areaChartValue

    334, 35, 31, 32

    As for not able to run my cte, hm. I just cut/paste what I had posted into SSMS, and had no issues.

  • This code seems to work with your test data:

    select area, ChartValue = left(ChartValue, len(ChartValue)-1)

    from (

    select distinct area, ChartValue = (

    select

    convert(varchar(20),SomeValue) + ', '

    from #TestTable

    where Area = t.area

    for xml path(''))

    from

    #TestTable t) a

    Note: I think there may be a better way, but I'm too tired to think it out right now.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Sweet As! Just got a chance to run this in my database. What took 20 minutes previously just ran in 8 seconds :Wow: !

    Thanks Wayne.

  • FYI, the larger the string to be made, and the more strings to be concated, the better FOR XML does wrt to all other workable solutions in SQL 2005. It's actually a bit slower than the best cursor and pseudocusror methods at the small end, but those all scale at O(n^2), (except a convoluted one that I developed that scales at O(n*log2(n))). FOR XML PATH however scales at O(n) so eventually it always wins out.

    (SQLCLR can probably be made to be even faster in SQL 2008, but in 2005 it's got an 8K storage limit that makes it virtually unusable for large string concatenation).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • In a case of "Just because you can do something doesn't mean you should do it"

    Ok, my first test with two charts took 17 seconds, as previously mentioned. Second test, four charts, 35 seconds. Although you really shouldn't extrapolate from only two data points, I start thinking "8 or 9 seconds per chart. 22 charts. Should be 4 or 5 minutes for all of them."

    Nope. 15 minutes. So, I now do 4 charts per statement (x 6 statements), and get the scalability that way. Even with some extra processing I've thrown in, it's only taking 7-8 minutes to process everything.

    Anybody have any thoughts?

  • Hi Tom,

    when I look at your first solution, you're using

    Select ... FROM TestTable

    WHERE TimePeriod >= ...

    AND Area = ....

    In Wayne's solution he's using:

    Select ... from #TestTable

    where Area = ...

    Assuming you have used proper indexing, there would be an Index on TimePeriod and Area.

    This index would not be used by Wayne's statement, since the left most column in this index is not part of the query.

    Try to add a separate index on area. This should help.

    Edit: Oooooopps!!

    I just saw the distinct area clause in your code, so I have to assume that there's an index covering the area column as well....

    Sorry again....:blush:



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply