Help on aggregation with dates

  • Hi,

    We have a report that returns a set of aggregated data and in some cases i've duplicated records due to some nulls on a date column, is there any way to maintain the aggregation working by escaping the null value?

    Example query:

    select t.col1, t.col2, sum(t.col3) col3, sum(t.col4) col4, col5
    from (
    select 7213 col1, 992 col2, 30 col3, 0 col4, cast(getdate() as date) col5
    union all
    select 7213 col1, 992 col2, 0 col3, 30 col4, null col5
    ) t
    group by t.col1, t.col2, col5

    It returns 2 rows because of the Null value on col5 and what should return was a single record, any ideia on how to escape the null on col5?

    Thanks in advance

  • This is likely to be problematic. Consider the following (I added an extra row of data)

    SELECT t.col1
    ,t.col2
    ,col3 = SUM(t.col3)
    ,col4 = SUM(t.col4)
    ,t.col5
    FROM
    (
    SELECT col1 = 7213
    ,col2 = 992
    ,col3 = 30
    ,col4 = 0
    ,col5 = CAST(GETDATE() AS DATE)
    UNION ALL
    SELECT col1 = 7213
    ,col2 = 992
    ,col3 = 0
    ,col4 = 30
    ,col5 = NULL
    UNION ALL
    SELECT col1 = 7213
    ,col2 = 992
    ,col3 = 0
    ,col4 = 30
    ,col5 = DATEADD(DAY, -1, CAST(GETDATE() AS DATE))
    ) t
    GROUP BY t.col1
    ,t.col2
    ,t.col5;

    Given the above scenario, which row should the NULL aggregate into?

    • This reply was modified 2 years, 10 months ago by  Phil Parkin.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • select t.col1, t.col2, sum(t.col3) col3, sum(t.col4) col4, col5
    from (
    select 7213 col1, 992 col2, 30 col3, 0 col4, cast(getdate() as date) col5
    union all
    select 7213 col1, 992 col2, 0 col3, 30 col4, null col5
    ) t
    where t.col5 is not null
    group by t.col1, t.col2, col5
  • where t.col5 is not null

    Hah! I thought of this too, but assumed that the values in Col3 and Col4 for the NULL date were required to be included somewhere in the SUMs.

    • This reply was modified 2 years, 10 months ago by  Phil Parkin.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • If it would be OK to force the NULL value to be added to lowest date:

    ;WITH test_data AS (
    SELECT col1 = 7213
    ,col2 = 992
    ,col3 = 30
    ,col4 = 0
    ,col5 = CAST(GETDATE() AS DATE)
    UNION ALL
    SELECT col1 = 7213
    ,col2 = 992
    ,col3 = 0
    ,col4 = 30
    ,col5 = NULL
    UNION ALL
    SELECT col1 = 7213
    ,col2 = 992
    ,col3 = 0
    ,col4 = 30
    ,col5 = DATEADD(DAY, -1, CAST(GETDATE() AS DATE))
    )
    SELECT t.col1
    ,t.col2
    ,col3 = SUM(t.col3)
    ,col4 = SUM(t.col4)
    ,col5
    FROM (
    SELECT col1, col2, col3, col4, CASE WHEN col5 IS NULL THEN LEAD(col5, 1) OVER(PARTITION BY col1, col2 ORDER BY col5) ELSE col5 END AS col5
    FROM test_data t
    ) AS t
    GROUP BY t.col1
    ,t.col2
    ,t.col5

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    If it would be OK to force the NULL value to be added to lowest date:

    ;WITH test_data AS (
    SELECT col1 = 7213
    ,col2 = 992
    ,col3 = 30
    ,col4 = 0
    ,col5 = CAST(GETDATE() AS DATE)
    UNION ALL
    SELECT col1 = 7213
    ,col2 = 992
    ,col3 = 0
    ,col4 = 30
    ,col5 = NULL
    UNION ALL
    SELECT col1 = 7213
    ,col2 = 992
    ,col3 = 0
    ,col4 = 30
    ,col5 = DATEADD(DAY, -1, CAST(GETDATE() AS DATE))
    )
    SELECT t.col1
    ,t.col2
    ,col3 = SUM(t.col3)
    ,col4 = SUM(t.col4)
    ,col5
    FROM (
    SELECT col1, col2, col3, col4, CASE WHEN col5 IS NULL THEN LEAD(col5, 1) OVER(PARTITION BY col1, col2 ORDER BY col5) ELSE col5 END AS col5
    FROM test_data t
    ) AS t
    GROUP BY t.col1
    ,t.col2
    ,t.col5

    That will have issues with multiple nulls,

    You could try something like,

    WITH TEMP_CTE AS(
    SELECT *, ISNULL(COL_ONE, MIN(COL_ONE) OVER(PARTITION BY 1)) AS SUM_DATE FROM
    (VALUES (getdate(), 1)
    , (DATEADD(day, -1, getdate()), 2)
    , (DATEADD(day, -1, getdate()), 5)
    , (null, 3)
    , (null, 4)
    , (null, 9)
    , (getdate(), 3)
    , (DATEADD(day, 1, getdate()), 5)) TESTY(COL_ONE, COL_TWO)
    )
    SELECT SUM_DATE, SUM(COL_TWO) FROM TEMP_CTE
    GROUP BY SUM_DATE

     

  • Hi,

    In our report the col5 only returns:

    1. a single date
    2. a null
    3. a single date and a null

    On the 1 and 2 scenarios no issue, but on the rare ocasions that the report returns option 3 i've the problem

    Thanks

  • Instead of using LEAD or LAG you could use MIN/MAX/FIRST_VALUE/LAST_VALUE depending on requirements.  Or - you could define a default value to all null values.

    Which one you use depends on the requirements.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • For over 35 years, SQL forums have asked posters to provide DDL, sample data and their first honest attempt at a solution on their own.

    I had a little something extra and require that you follow ISO and ANSI standard rules, as well as good coding practices. Looking at the number of points you have, you probably have no idea what any of this means. But basically you post a lot of garbage exclamation. For example, why do you think that col "-n" is a meaningful name in a relational model? Sorry that this is what you might have expressed in a spreadsheet 30 years ago. If you're going to do this kind of garbage programming, that at least put the "as" keyword between an expression and its new name.

    Without any DDL, it is impossible to answer your question. We cannot read your mind or see what's on your screen. I would strongly suggest that you start using the ANSI/ISO standard CURRENT_TIMESTAMP instead of the old Sybase getdate(). Writing things with proprietary code simply makes you look at a hillbilly who doesn't yet speak the language.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Thanks for your sharing,That's pretty cool.

    voip phone system service provider

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply