Grouping Data into categories based on dates

  • IF OBJECT_ID('tempdb..#_tempData', 'U') IS NOT NULL
     DROP TABLE #_tempData
    CREATE TABLE #_tempData (
      ID INT
     ,StartDate DATE
     ,EndDate DATE
     ,CATEGORY1 VARCHAR(50)
     ,LOCATION VARCHAR(50)
     ,NUMBER INT
    );

    INSERT INTO #_tempData
    VALUES
        (1,'1950-01-01 00:00:00.000','2017-09-27 00:00:00.000','CAT 1','SOUTH',24)
        ,(2,'2017-09-27 00:00:00.000',NULL,'CAT 1','SOUTH',26)
             ,(3,'2014-04-01 00:00:00.000','2014-05-31 00:00:00.000','CAT 2','SOUTH',4)
                 ,(4,'1950-01-01 00:00:00.000',NULL,'CAT 3','SOUTH',4)

    SELECT * FROM #_tempData;

    I'm looking for some assistance in regard to grouping data based on dates, I have some data that is categorised based on 'to' and 'from' dates which I wish to aggregate. The data can have NULL in the date to column which complicates the issue however the results I would like to see are:

    Hoping to aggregate to 'location' level and provide the figure and the dates at which the aggregated figure is valid, so between '1950-01-01' and '2014-04-01' the location SOUTH would have 28. From '2014-04-01'  until '2014-05-31'  it would have 32 then from '2014-05-31' until ''2017-09-27'  it would be 28 again and then from 2017-09-27' onward it would be 34 .  

    Hope this makes sense, any assistance appreciated. 

  • Would something like this work for you?


    SELECT
        Location,
        DateValue, Sum(Number)
    FROM
        #_tempData TheData
    JOIN
        (
            SELECT     StartDate AS DateValue
            FROM     #_tempData
            UNION
            SELECT     IsNull(ENDDate, '01-01-2099')
            FROM     #_tempData
        ) DateList
    ON
        DateList.DateValue BETWEEN TheData.StartDate AND DateAdd(day, -1, IsNull(TheData.EndDate, '2099-01-01'))
    GROUP BY
        Location, DateValue
    ORDER BY
        Location, DateValue

  • Almost what I'm after, unfortunately I need the 'To' and 'From' dates that the location would be valid. Think the best option might be to write it into another table as I'm going to then join it to transaction data where the transaction occurs between the two dates where the location was valid. Many thanks for the reply much appreciated.

  • Sorry I didn't figure you needed the end date, since you have it as the start date of the next row. 
    If you really need that on the same row, you can get it with a LEAD window function.
    Here I am using LEAD to get the start date of the next row, and subtracting 1 day from it as the end date:


    SELECT
      Location,
      DateValue AS Range_Start_Date,
        DateAdd(day, -1, LEAD(DateValue, 1) OVER(ORDER BY DateValue)) AS Range_End_Date,
        Sum(Number) AS Total
    FROM
      #_tempData TheData
    JOIN
      (
       SELECT  StartDate AS DateValue
       FROM  #_tempData
       UNION
       SELECT  IsNull(ENDDate, '01-01-2099')
       FROM  #_tempData
      ) DateList
    ON
      DateList.DateValue BETWEEN TheData.StartDate AND DateAdd(day, -1, IsNull(TheData.EndDate, '2099-01-01'))
    GROUP BY
      Location, DateValue
    ORDER BY
      Location, DateValue

  • Thanks very much this is exactly what I was after, I was away trying to write a cursor and it was melting my head.

    Is there any simple way to upscale this this type of approach to cope with more data? 

    IF OBJECT_ID('tempdb..#_tempData', 'U') IS NOT NULL
    DROP TABLE #_tempData
    CREATE TABLE #_tempData (
    ID INT
    ,StartDate DATE
    ,EndDate DATE
    ,CATEGORY1 VARCHAR(50)
    ,LOCATION VARCHAR(50)
    ,NUMBER INT
    );

    INSERT INTO #_tempData
    VALUES 
      (1,'1950-01-01 00:00:00.000','2017-09-27 00:00:00.000','CAT 1','SOUTH',24)
      ,(2,'2017-09-27 00:00:00.000',NULL,'CAT 1','SOUTH',26)
        ,(3,'2014-04-01 00:00:00.000','2014-05-31 00:00:00.000','CAT 2','SOUTH',4)
         ,(4,'1950-01-01 00:00:00.000',NULL,'CAT 3','SOUTH',4)

      ,(1,'1950-01-01 00:00:00.000','2014-07-31 00:00:00.000','CAT 1','WEST',23)
      ,(2,'2014-08-01 00:00:00.000','2015-11-11 00:00:00.000','CAT 1','WEST',16)
        ,(3,'2015-11-12 00:00:00.000','2016-03-31 00:00:00.000','CAT 2','WEST',12)
         ,(4,'2015-11-12 00:00:00.000','2016-03-31 00:00:00.000','CAT 3','WEST',4)
         ,(5,'2016-04-01 00:00:00.000',NULL,'CAT 3','WEST',4)
       
    SELECT
    TheData.Location,
    DateValue AS Range_Start_Date,
      DateAdd(day, -1, LEAD(DateValue, 1) OVER(partition by TheData.Location ORDER BY DateValue)) AS Range_End_Date,
      Sum(Number) AS Total
    FROM
    #_tempData TheData
    JOIN
    (
      SELECT Location, StartDate AS DateValue
      FROM #_tempData
      UNION
      SELECT Location, IsNull(ENDDate, '01-01-2099')
      FROM #_tempData
    ) DateList
    ON
    DateList.DateValue BETWEEN TheData.StartDate AND DateAdd(day, -1, IsNull(TheData.EndDate, '2099-01-01')) and DateList.Location = TheData.Location
    GROUP BY
    TheData.Location, DateValue
    ORDER BY
    Location, DateValue

    I added the location to the join condition and partition by to the lead function and it up-scales nicely. Many thanks again for the assistance.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply