problem inner joininig a derived column

  • Hi, I'm trying to get a nice chart working, that gets data from SQL Server, but having trouble getting the right data...

    I have a table, that have a value and a date... I want to sum all the values and group them by 12 months of the current year.

    Se lets say i got the table [#a]... With id, recorddate and myvalue columns as shown below.

    CREATE TABLE [#a](

    [id] [int] NOT NULL,

    [recorddate] [date] NULL,

    [myvalue] [int] NULL,

    ) ON [PRIMARY]

    GO

    INSERT INTO [#a] (id, [recorddate], [myvalue])

    SELECT 1,'2012-01-02',10 UNION ALL

    SELECT 2,'2012-01-13',20 UNION ALL

    SELECT 3,'2012-02-02',30 UNION ALL

    SELECT 4,'2012-02-24',40 UNION ALL

    SELECT 5,'2012-08-02',50 UNION ALL

    SELECT 6,'2012-12-01',60 UNION ALL

    SELECT 7,'2012-12-28',70

    And i would like to get this result...

    '2012-01-01' || '30'

    '2012-02-01' || '70'

    '2012-03-01' || '0'

    '2012-04-01' || '0'

    '2012-05-01' || '0'

    '2012-06-01' || '0'

    '2012-07-01' || '0'

    '2012-08-01' || '50'

    '2012-09-01' || '0'

    '2012-10-01' || '0'

    '2012-11-01' || '0'

    '2012-12-01' || '130'

    Ive almost got it to work... but i fail on the joining step where i join my CTE with a derived column... as seen below:

    CREATE TABLE [#a](

    [id] [int] NOT NULL,

    [recorddate] [date] NULL,

    [myvalue] [int] NULL,

    ) ON [PRIMARY]

    GO

    INSERT INTO [#a] (id, [recorddate], [myvalue])

    SELECT 1,'2012-01-02',10 UNION ALL

    SELECT 2,'2012-01-13',20 UNION ALL

    SELECT 3,'2012-02-02',30 UNION ALL

    SELECT 4,'2012-02-24',40 UNION ALL

    SELECT 5,'2012-08-02',50 UNION ALL

    SELECT 6,'2012-12-01',60 UNION ALL

    SELECT 7,'2012-12-28',70

    WITH myCTE (c)

    AS

    (

    SELECT CAST(CAST(YEAR(GETDATE()) AS CHAR(4))+'-1-1' AS DATE) AS c

    UNION ALL

    SELECT CAST(CAST(YEAR(GETDATE()) AS CHAR(4))+'-2-1' AS DATE) AS c

    UNION ALL

    SELECT CAST(CAST(YEAR(GETDATE()) AS CHAR(4))+'-3-1' AS DATE) AS c

    UNION ALL

    SELECT CAST(CAST(YEAR(GETDATE()) AS CHAR(4))+'-4-1' AS DATE) AS c

    UNION ALL

    SELECT CAST(CAST(YEAR(GETDATE()) AS CHAR(4))+'-5-1' AS DATE) AS c

    UNION ALL

    SELECT CAST(CAST(YEAR(GETDATE()) AS CHAR(4))+'-6-1' AS DATE) AS c

    UNION ALL

    SELECT CAST(CAST(YEAR(GETDATE()) AS CHAR(4))+'-7-1' AS DATE) AS c

    UNION ALL

    SELECT CAST(CAST(YEAR(GETDATE()) AS CHAR(4))+'-8-1' AS DATE) AS c

    UNION ALL

    SELECT CAST(CAST(YEAR(GETDATE()) AS CHAR(4))+'-9-1' AS DATE) AS c

    UNION ALL

    SELECT CAST(CAST(YEAR(GETDATE()) AS CHAR(4))+'-10-1' AS DATE) AS c

    UNION ALL

    SELECT CAST(CAST(YEAR(GETDATE()) AS CHAR(4))+'-11-1' AS DATE) AS c

    UNION ALL

    SELECT CAST(CAST(YEAR(GETDATE()) AS CHAR(4))+'-12-1' AS DATE) AS c

    )

    SELECT DISTINCT c, SUM(myvalue)

    FROM myCTE LEFT JOIN #a ON myCTE.c=#a.CAST(YEAR(#a.recorddate) as CHAR(4))+'-'+CAST(MONTH(#a.recorddate) as CHAR(2))+'-'+'01'

    GROUP BY c

    DROP TABLE #a

    Is it possible to achieve it this way?

  • Try building a calendar table, joining on your dates. Then you can aggregate by month using the month values in the calendar table. See the following article[/url] for details about how to use calendar tables.

  • Not currently at my desk so the syntax might need correcting, but could you do something like this?

    WITH CTE(N) AS (SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))a(N)),

    CTE2(N) AS (SELECT 1 FROM CTE x CROSS JOIN CTE y),

    CTE3(N) AS (SELECT 1 FROM CTE2 x CROSS JOIN CTE2 y),

    CTE4(N) AS (SELECT 1 FROM CTE3 x CROSS JOIN CTE3 y),

    CTE5(N) AS (SELECT 1 FROM CTE4 x CROSS JOIN CTE4 y),

    CTE6(N) AS (SELECT 0 UNION ALL

    SELECT TOP 97199 ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM CTE5),

    TALLY(N) AS (SELECT DATEADD(month, N, 0)

    FROM CTE6)

    SELECT a.N AS recorddate, ISNULL(b.myValue,0) AS myValue

    FROM TALLY a

    OUTER APPLY (SELECT DATEADD(month, DATEDIFF(month, 0, [recorddate]), 0), SUM([myvalue])

    FROM [#a] c

    WHERE DATEADD(month, DATEDIFF(month, 0, [recorddate]), 0) = a.N

    GROUP BY DATEADD(month, DATEDIFF(month, 0, [recorddate]), 0)

    ) b(recorddate,myValue)

    CROSS APPLY (SELECT MIN(DATEADD(month, DATEDIFF(month, 0, [recorddate]), 0)),

    MAX(DATEADD(month, DATEDIFF(month, 0, [recorddate]), 0))

    FROM [#a]) d(minDate,maxDate)

    WHERE a.N >= d.minDate AND a.N <= d.maxDate;


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • To follow-up my earlier suggestion, note that Cadavre's approach uses a tally table. A calendar table is a specialized version of a tally table.

    It's a powerful tool. In addition to the article on calendar tables, you may find the following articles about tally tables give you a helpful addition to your querying tools.

    Jeff Moden on Tally Tables[/url]

    Stefan Krzywicki on the use of Tally Tables - I[/url]

    Stefan Krzywicki on the use of Tally Tables - II[/url]

  • kl25 (11/30/2012)


    To follow-up my earlier suggestion, note that Cadavre's approach uses a tally table. A calendar table is a specialized version of a tally table.

    In fact, my CTE tally table is a calendar table (note: TALLY(N) AS (SELECT DATEADD(month, N, 0) FROM CTE6) ) of months starting at 1900-01-01 and ending with 9999-12-01 (total is 97,200 rows, so I probably went overboard with the CTEs that build the sequential number lists 😛 )


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • In fact, my CTE tally table is a calendar table (note: TALLY(N) AS (SELECT DATEADD(month, N, 0) FROM CTE6) ) of months starting at 1900-01-01 and ending with 9999-12-01 (total is 97,200 rows, so I probably went overboard with the CTEs that build the sequential number lists 😛 )

    You're right. 🙂 Given the approach used in the initial query, it didn't seem that the OP was familiar with calendar or tally tables. Just wanted to provide further background using the articles cited. You've given the OP a great example of why tally and calendar tables are powerful tools. 🙂

  • In fact, my CTE tally table is a calendar table (note: TALLY(N) AS (SELECT DATEADD(month, N, 0) FROM CTE6) ) of months starting at 1900-01-01 and ending with 9999-12-01 (total is 97,200 rows, so I probably went overboard with the CTEs that build the sequential number lists [Tongue] )

    Thanks once again for an excellent reply. The table worked well with the real data. However there's some weird hiccups with the query. If i execute the query multiple times, it goes into for ever executing mode... i.e cant complete the execution.

    Here's the slightly modified query:

    WITH CTE(N) AS (SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(N)),

    CTE2(N) AS (SELECT 1 FROM CTE x CROSS JOIN CTE y),

    CTE3(N) AS (SELECT 1 FROM CTE2 x CROSS JOIN CTE2 y),

    CTE4(N) AS (SELECT 1 FROM CTE3 x CROSS JOIN CTE3 y),

    CTE5(N) AS (SELECT 1 FROM CTE4 x CROSS JOIN CTE4 y),

    CTE6(N) AS (SELECT 0 UNION ALL

    SELECT TOP 97199 ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM CTE5),

    TALLY(N) AS (SELECT DATEADD(month, N, 0)

    FROM CTE6)

    SELECT a.N AS recorddate, ISNULL(b.myValue,0) AS myValue

    FROM TALLY a

    OUTER APPLY (SELECT DATEADD(month, DATEDIFF(month, 0, f.recorddate), 0), SUM(e.numvalue)

    FROM admin_tasks as e INNER JOIN admin_tasks as f ON e.id=f.numvalue

    WHERE DATEADD(month, DATEDIFF(month, 0, f.recorddate), 0) = a.N AND f.actionid=5 AND f.numtype=1

    GROUP BY DATEADD(month, DATEDIFF(month, 0, f.recorddate), 0)

    ) b(recorddate,myValue)

    CROSS APPLY (SELECT MIN(DATEADD(month, DATEDIFF(month, 0, h.recorddate), 0)),

    MAX(DATEADD(month, DATEDIFF(month, 0, h.recorddate), 0))

    FROM admin_tasks as g INNER JOIN admin_tasks as h ON g.id=h.numvalue WHERE h.actionid=5 AND h.numtype=1) d(minDate,maxDate)

    WHERE a.N >= d.minDate AND a.N <= d.maxDate

    ORDER BY recorddate ASC

    And here's the statistics IO & time

    (11 row(s) affected)

    Table 'admin_tasks'. Scan count 17, logical reads 8099, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 11, logical reads 28546, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 187 ms, elapsed time = 257 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    Any thought on why the query stagnates?

    UPDATE: Checked some variables and getting 97k rows was a slight overkill. Changing it to 2050 rows made a big difference, and no more hickups. 🙂

  • A followup question: if i only need the years span of 2000-2050, how do i select them instead of

    SELECT TOP 97199 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM CTE5

    Thanks

  • memymasta (12/3/2012)


    A followup question: if i only need the years span of 2000-2050, how do i select them instead of

    SELECT TOP 97199 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM CTE5

    Thanks

    Well, there are 611 months between 2000-01-01 and 2050-12-01, so you'd want to change the code to something like this: -

    CTE6(N) AS (SELECT 0 UNION ALL

    SELECT TOP 611 ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM CTE5),

    TALLY(N) AS (SELECT DATEADD(month, N, CAST('2000' AS DATETIME))

    FROM CTE6)

    Note the change to the "tally" section, which starts the calendar at 2000-01-01 instead of the previous version which started at 1900-01-01.


    --EDIT--

    memymasta (12/3/2012)


    Any thought on why the query stagnates?

    UPDATE: Checked some variables and getting 97k rows was a slight overkill. Changing it to 2050 rows made a big difference, and no more hickups. 🙂

    Without seeing an actual execution plan, no. I'd imagine it'll be an indexing issue. You could also try changing the CTE to a physical table of all of the months, which when properly indexes may give some performance boost.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • A calendar table is not really needed for this:

    select

    YearMonth = dateadd(mm,datediff(mm,0,[recorddate]),0),

    MyValueSum = sum([myvalue])

    from

    #a

    group by

    dateadd(mm,datediff(mm,0,[recorddate]),0)

    order by

    dateadd(mm,datediff(mm,0,[recorddate]),0)

    Results:

    YearMonth MyValueSum

    ----------------------- -----------

    2012-01-01 00:00:00.000 30

    2012-02-01 00:00:00.000 70

    2012-08-01 00:00:00.000 50

    2012-12-01 00:00:00.000 130

  • Michael Valentine Jones (12/3/2012)


    A calendar table is not really needed for this:

    select

    YearMonth = dateadd(mm,datediff(mm,0,[recorddate]),0),

    MyValueSum = sum([myvalue])

    from

    #a

    group by

    dateadd(mm,datediff(mm,0,[recorddate]),0)

    order by

    dateadd(mm,datediff(mm,0,[recorddate]),0)

    Results:

    YearMonth MyValueSum

    ----------------------- -----------

    2012-01-01 00:00:00.000 30

    2012-02-01 00:00:00.000 70

    2012-08-01 00:00:00.000 50

    2012-12-01 00:00:00.000 130

    The op would also like "0" for the intervening months so we have to figure out some way to accomplish that. a calendar table to join to seems to be the easiest from what i can see.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • memymasta (11/30/2012)


    Ive almost got it to work... but i fail on the joining step where i join my CTE with a derived column...

    When you say you fail, what exactly does that mean? In your ON criteria, you've got...

    ON myCTE.c=#a.CAST(YEAR(#a.recorddate)

    change it to (take out the extra '#a.' alias, and leave the rest the same)...

    ON myCTE.c=CAST(YEAR(#a.recorddate)

    And it worked for me. The only differece being there are NULL's where you have zero's in your mock results. It looks like there are some better suggestions any way, but I was just curious if this was the only 'failure' you were experiencing.

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

  • @cadavre - Thats that did the trick!

    @celko - Very useful insight, i like it. One thing i don't understand is:

    That is 'yyyy-mm-00' for a month within a year and 'yyyy-00-00' for the whole year.

    Do you store that as date?

    SELECT CAST('2012-01-00' AS DATE)

    This don't seem to work.

    @greg Snidow - Ah yes! Your small change made it work. Updated the code a bit and it looks really good now.

    CREATE TABLE [#a](

    [id] [int] NOT NULL,

    [recorddate] [date] NULL,

    [myvalue] [int] NULL,

    ) ON [PRIMARY]

    GO

    INSERT INTO [#a] (id, [recorddate], [myvalue])

    SELECT 1,'2012-01-02',10 UNION ALL

    SELECT 2,'2012-01-13',20 UNION ALL

    SELECT 3,'2012-02-02',30 UNION ALL

    SELECT 4,'2012-02-24',40 UNION ALL

    SELECT 5,'2012-08-02',50 UNION ALL

    SELECT 6,'2012-12-01',60 UNION ALL

    SELECT 7,'2012-12-28',70

    GO

    WITH myCTE (c)

    AS

    (

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+1,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+2,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+3,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+4,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+5,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+6,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+7,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+8,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+9,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+10,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+11,0)

    )

    SELECT DISTINCT c, ISNULL(SUM(myvalue),0) as myValue

    FROM myCTE LEFT JOIN #a ON myCTE.c=CAST(YEAR(#a.recorddate) as CHAR(4))+'-'+CAST(MONTH(#a.recorddate) as CHAR(2))+'-'+'01'

    GROUP BY c

    GO

    DROP TABLE #a

  • memymasta (12/4/2012)


    @Cadavre - Thats that did the trick!

    @celko - Very useful insight, i like it. One thing i don't understand is:

    That is 'yyyy-mm-00' for a month within a year and 'yyyy-00-00' for the whole year.

    Do you store that as date?

    SELECT CAST('2012-01-00' AS DATE)

    This don't seem to work.

    @greg Snidow - Ah yes! Your small change made it work. Updated the code a bit and it looks really good now.

    CREATE TABLE [#a](

    [id] [int] NOT NULL,

    [recorddate] [date] NULL,

    [myvalue] [int] NULL,

    ) ON [PRIMARY]

    GO

    INSERT INTO [#a] (id, [recorddate], [myvalue])

    SELECT 1,'2012-01-02',10 UNION ALL

    SELECT 2,'2012-01-13',20 UNION ALL

    SELECT 3,'2012-02-02',30 UNION ALL

    SELECT 4,'2012-02-24',40 UNION ALL

    SELECT 5,'2012-08-02',50 UNION ALL

    SELECT 6,'2012-12-01',60 UNION ALL

    SELECT 7,'2012-12-28',70

    GO

    WITH myCTE (c)

    AS

    (

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+1,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+2,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+3,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+4,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+5,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+6,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+7,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+8,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+9,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+10,0)

    UNION ALL

    SELECT DATEADD(mm, DATEDIFF(yy, 0, GETDATE())*12+11,0)

    )

    SELECT DISTINCT c, ISNULL(SUM(myvalue),0) as myValue

    FROM myCTE LEFT JOIN #a ON myCTE.c=CAST(YEAR(#a.recorddate) as CHAR(4))+'-'+CAST(MONTH(#a.recorddate) as CHAR(2))+'-'+'01'

    GROUP BY c

    GO

    DROP TABLE #a

    A couple things i may change. first is your calendar table, what happens when you want to start at a different date. the version below to me has a smaller maintenance area when it comes to changing the date range you want in the report. the second change is the ON clause, i replaced your string concatenation to some date math to get the first of the month so we have like data types.

    WITH cteTally(N) AS (SELECT 0 UNION ALL

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))x(N)

    CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))y(N)),

    myCTE (c) AS (SELECT TOP 12 DATEADD(MM,N,DATEADD(YY, DATEDIFF(YY,0,GETDATE()),0))

    FROM cteTally)

    SELECT c, ISNULL(SUM(myvalue),0) as myValue

    FROM myCTE

    LEFT JOIN #a

    ON myCTE.c = DATEADD(MM,DATEDIFF(MM,0,#a.recorddate),0)

    GROUP BY c

    EDIT: had Tally (my persisted tally table) and not the cteTally.

    If you have no idea what a tally table is check out this great article by Jeff Moden http://www.sqlservercentral.com/articles/T-SQL/62867/


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply