Grouping on Date Range / Overlap

  • Given the following sample data, is it possible to produce a set of rows that group on column v where start and stop overlap.

    SET DATEFORMAT DMY

    GO

    WITH testData(Start, Stop , v, ID )

    AS

    (

    select CAST('01/12/08' as DATETIME), CAST('06/12/08' AS DATETIME), CAST(1 AS INT), 1

    UNION

    select CAST('07/12/08' as DATETIME), CAST('13/12/08' AS DATETIME), CAST(1 AS INT), 2

    UNION

    select CAST('14/12/08' as DATETIME), CAST('20/12/08' AS DATETIME), CAST(2 AS INT), 3

    UNION

    select CAST('21/12/08' as DATETIME), CAST('27/12/08' AS DATETIME), CAST(3 AS INT), 4

    UNION

    select CAST('28/12/08' as DATETIME), CAST('03/01/09' AS DATETIME), CAST(3 AS INT), 5

    )

    SELECT * FROM testData

    sample data:

    Start Stop v ID

    ----------------------- ----------------------- ----------- -----------

    2008-12-01 00:00:00.000 2008-12-06 00:00:00.000 1 1

    2008-12-07 00:00:00.000 2008-12-13 00:00:00.000 1 2

    2008-12-14 00:00:00.000 2008-12-20 00:00:00.000 2 3

    2008-12-21 00:00:00.000 2008-12-27 00:00:00.000 3 4

    2008-12-28 00:00:00.000 2009-01-03 00:00:00.000 3 5

    What I want is:

    Start Stop v

    ----------------------- ----------------------- -----------

    2008-12-01 00:00:00.000 2008-12-13 00:00:00.000 1

    2008-12-14 00:00:00.000 2008-12-20 00:00:00.000 2

    2008-12-21 00:00:00.000 2009-01-03 00:00:00.000 3

    I'm trying to do this without using cursors / loops.

    Thanks in advance

  • The first start datetime and the last stop datetime in one row, for each ID?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Are there any gaps between the start and end times for a v? If yes then what would be the expected result?

    Based on the sample data & the required output, what you need is...

    WITH testData(Start, Stop , v, ID )

    AS

    (

    SELECT CAST('01/12/08' as DATETIME), CAST('06/12/08' AS DATETIME), CAST(1 AS INT), 1

    UNION

    SELECT CAST('07/12/08' as DATETIME), CAST('13/12/08' AS DATETIME), CAST(1 AS INT), 2

    UNION

    SELECT CAST('14/12/08' as DATETIME), CAST('20/12/08' AS DATETIME), CAST(2 AS INT), 3

    UNION

    SELECT CAST('21/12/08' as DATETIME), CAST('27/12/08' AS DATETIME), CAST(3 AS INT), 4

    UNION

    SELECT CAST('28/12/08' as DATETIME), CAST('03/01/09' AS DATETIME), CAST(3 AS INT), 5

    )

    SELECT v, MIN(Start) AS Start, MAX(Stop) AS Stop FROM testData GROUP BY v

    --Ramesh


  • for each value, ID is not required in output data

  • Sorry was not clear what I'm looking for. I want to group on the values, and create a new grouping for each date range gap. Values should only group where the date ranges overlap, i.e. there is not gap between the date ranges and value's changing.

    Also there are two values to group on, and either could change. There can also be gaps in date ranges.

    So a better example of the data would be:

    WITH testData(Start, Stop , v1, v2, ID )

    AS

    (

    SELECT CAST('01/01/09' as DATETIME), CAST('31/01/09' AS DATETIME), 1399, 2499, 729275 UNION

    SELECT CAST('18/02/09' as DATETIME), CAST('24/02/09' AS DATETIME), 1449, 2559, 729278 UNION

    SELECT CAST('25/02/09' as DATETIME), CAST('18/03/09' AS DATETIME), 1449, 2559, 729279 UNION

    SELECT CAST('19/03/09' as DATETIME), CAST('31/03/09' AS DATETIME), 1449, 2559, 729280 UNION

    SELECT CAST('15/04/09' as DATETIME), CAST('30/04/09' AS DATETIME), 1349, 2349, 729283 UNION

    SELECT CAST('01/05/09' as DATETIME), CAST('13/05/09' AS DATETIME), 1299, 1799, 729284 UNION

    SELECT CAST('14/05/09' as DATETIME), CAST('26/05/09' AS DATETIME), 1299, 1799, 729285 UNION

    SELECT CAST('27/05/09' as DATETIME), CAST('30/06/09' AS DATETIME), 1299, 1799, 729286 UNION

    SELECT CAST('01/07/09' as DATETIME), CAST('21/07/09' AS DATETIME), 1349, 1899, 729287 UNION

    SELECT CAST('22/07/09' as DATETIME), CAST('28/07/09' AS DATETIME), 1349, 1999, 729288 UNION

    SELECT CAST('30/08/09' as DATETIME), CAST('08/09/09' AS DATETIME), 1299, 1799, 729292 UNION

    SELECT CAST('09/09/09' as DATETIME), CAST('13/10/09' AS DATETIME), 1199, 1799, 729293

    )

    SELECT * from testData

    start stop v1 v2

    ----------------------- ----------------------- ----------- -----------

    2009-01-01 00:00:00.000 2009-01-31 00:00:00.000 1399 2499

    2009-02-18 00:00:00.000 2009-02-24 00:00:00.000 1449 2559

    2009-02-25 00:00:00.000 2009-03-18 00:00:00.000 1449 2559

    2009-03-19 00:00:00.000 2009-03-31 00:00:00.000 1449 2559

    2009-04-15 00:00:00.000 2009-04-30 00:00:00.000 1349 2349

    2009-05-01 00:00:00.000 2009-05-13 00:00:00.000 1299 1799

    2009-05-14 00:00:00.000 2009-05-26 00:00:00.000 1299 1799

    2009-05-27 00:00:00.000 2009-06-30 00:00:00.000 1299 1799

    2009-07-01 00:00:00.000 2009-07-21 00:00:00.000 1349 1899

    2009-07-22 00:00:00.000 2009-07-28 00:00:00.000 1349 1999

    2009-08-30 00:00:00.000 2009-09-08 00:00:00.000 1299 1799

    2009-09-09 00:00:00.000 2009-10-13 00:00:00.000 1199 1799

    Result required should be:

    start stop v1 v2

    ----------------------- ----------------------- ----------- -----------

    2009-01-01 00:00:00.000 2009-01-31 00:00:00.000 1399 2499

    2009-02-18 00:00:00.000 2009-03-31 00:00:00.000 1449 2559

    2009-04-15 00:00:00.000 2009-04-30 00:00:00.000 1349 2349

    2009-05-01 00:00:00.000 2009-06-30 00:00:00.000 1299 1799

    2009-07-01 00:00:00.000 2009-07-21 00:00:00.000 1349 1899

    2009-07-22 00:00:00.000 2009-07-28 00:00:00.000 1349 1999

    2009-08-30 00:00:00.000 2009-09-08 00:00:00.000 1299 1799

    2009-09-09 00:00:00.000 2009-10-13 00:00:00.000 1199 1799

  • Using the "running totals" method to take care of gaps...

    [font="Courier New"]SET DATEFORMAT DMY

    GO

    WITH testData(Start, [Stop] , v1, v2, ID )

    AS

    (

    SELECT CAST('01/01/09' AS DATETIME), CAST('31/01/09' AS DATETIME), 1399, 2499, 729275 UNION

    SELECT CAST('18/02/09' AS DATETIME), CAST('24/02/09' AS DATETIME), 1449, 2559, 729278 UNION

    SELECT CAST('25/02/09' AS DATETIME), CAST('18/03/09' AS DATETIME), 1449, 2559, 729279 UNION

    SELECT CAST('19/03/09' AS DATETIME), CAST('28/03/09' AS DATETIME), 1449, 2559, 729280 UNION -- CHANGED ROW

    SELECT CAST('01/04/09' AS DATETIME), CAST('30/04/09' AS DATETIME), 1449, 2559, 729281 UNION -- NEW ROW WITH DATE GAP

    SELECT CAST('15/04/09' AS DATETIME), CAST('30/04/09' AS DATETIME), 1349, 2349, 729283 UNION

    SELECT CAST('01/05/09' AS DATETIME), CAST('13/05/09' AS DATETIME), 1299, 1799, 729284 UNION

    SELECT CAST('14/05/09' AS DATETIME), CAST('26/05/09' AS DATETIME), 1299, 1799, 729285 UNION

    SELECT CAST('27/05/09' AS DATETIME), CAST('30/06/09' AS DATETIME), 1299, 1799, 729286 UNION

    SELECT CAST('01/07/09' AS DATETIME), CAST('21/07/09' AS DATETIME), 1349, 1899, 729287 UNION

    SELECT CAST('22/07/09' AS DATETIME), CAST('28/07/09' AS DATETIME), 1349, 1999, 729288 UNION

    SELECT CAST('30/08/09' AS DATETIME), CAST('08/09/09' AS DATETIME), 1299, 1799, 729292 UNION

    SELECT CAST('09/09/09' AS DATETIME), CAST('13/10/09' AS DATETIME), 1199, 1799, 729293

    )

    SELECT *, CAST(NULL AS INT) AS [Group]

    INTO #Temp

    FROM testData

    ORDER BY ID

    ALTER TABLE #Temp ADD CONSTRAINT [ID] PRIMARY KEY CLUSTERED (ID)

    DECLARE @Group INT, @Stop DATETIME, @v1 INT, @v2 INT

    SET @Group = 0

    UPDATE #Temp SET

       @Group = [Group] = CASE WHEN v1 = @v1 AND v2 = @v2 AND Start = @Stop+1 THEN @Group ELSE @Group+1 END,

       @Stop = [Stop], @v1 = v1, @v2 = v2

    SELECT MIN(Start) AS Start, MAX([Stop]) AS [Stop], v1, v2

    FROM #Temp

    GROUP BY v1, v2, [Group]

    ORDER BY [Group]

    DROP TABLE #Temp[/font]

    Output:

    Start Stop v1 v2

    2009-01-01 00:00:00.0002009-01-31 00:00:00.00013992499

    2009-02-18 00:00:00.0002009-03-28 00:00:00.00014492559

    2009-04-01 00:00:00.0002009-04-30 00:00:00.00014492559

    2009-04-15 00:00:00.0002009-04-30 00:00:00.00013492349

    2009-05-01 00:00:00.0002009-06-30 00:00:00.00012991799

    2009-07-01 00:00:00.0002009-07-21 00:00:00.00013491899

    2009-07-22 00:00:00.0002009-07-28 00:00:00.00013491999

    2009-08-30 00:00:00.0002009-09-08 00:00:00.00012991799

    2009-09-09 00:00:00.0002009-10-13 00:00:00.00011991799

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Excellent article, thanks Peter.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks very much, helped a lot!

  • Thank you!

    We were thinking the same idea, but you were faster with the response..


    N 56°04'39.16"
    E 12°55'05.25"

  • Still you need to investigate why there is no range joining for ranges that deffer only 1 day (end to next start).

    I've messed around this procedural code, and all matching ranges are collapsed.

    drop table #TheData

    go

    drop table #TheRanges

    go

    print convert(char(26), getdate(),121)

    go

    /* determine ranges */

    ;WITH testData(Start, [Stop] , v1, v2 , ID )

    AS

    (

    SELECT convert(DATETIME, '01/01/09' , 3 ), convert(DATETIME, '31/01/09' , 3 ), 1399, 2499, 729275

    UNION

    SELECT convert(DATETIME, '18/02/09' , 3 ), convert(DATETIME, '24/02/09' , 3 ), 1449, 2559, 729278

    UNION

    SELECT convert(DATETIME, '25/02/09' , 3 ), convert(DATETIME, '18/03/09' , 3 ), 1449, 2559, 729279

    UNION

    SELECT convert(DATETIME, '19/03/09' , 3 ), convert(DATETIME, '31/03/09' , 3 ), 1449, 2559, 729280

    UNION

    SELECT convert(DATETIME, '15/04/09' , 3 ), convert(DATETIME, '30/04/09' , 3 ), 1349, 2349, 729283

    UNION

    SELECT convert(DATETIME, '01/05/09' , 3 ), convert(DATETIME, '13/05/09' , 3 ), 1299, 1799, 729284

    UNION

    SELECT convert(DATETIME, '14/05/09' , 3 ), convert(DATETIME, '26/05/09' , 3 ), 1299, 1799, 729285

    UNION

    SELECT convert(DATETIME, '27/05/09' , 3 ), convert(DATETIME, '30/06/09' , 3 ), 1299, 1799, 729286

    UNION

    SELECT convert(DATETIME, '01/07/09' , 3 ), convert(DATETIME, '21/07/09' , 3 ), 1349, 1899, 729287

    UNION

    SELECT convert(DATETIME, '22/07/09' , 3 ), convert(DATETIME, '28/07/09' , 3 ), 1349, 1999, 729288

    UNION

    SELECT convert(DATETIME, '30/08/09' , 3 ), convert(DATETIME, '08/09/09' , 3 ), 1299, 1799, 729292

    UNION

    SELECT convert(DATETIME, '09/09/09' , 3 ), convert(DATETIME, '13/10/09' , 3 ), 1199, 1799, 729293

    )

    Select *

    into #TheData

    from testData

    ;

    With TheRanges (Start, [Stop] , ID, ID2 , Source )

    AS

    (Select T1.Start

    , coalesce(T2.[Stop],T1.[Stop]) as [Stop]

    , T1.ID

    , T2.ID as ID2

    , Case when T2.ID is null then 'Original' else 'Modified' end as Source

    from #TheData T1

    left join #TheData T2

    on T1.Start < T2.Start

    and datediff(d, T1.[Stop], T2.Start) < 2

    )

    Select *

    into #TheRanges

    from TheRanges

    ;

    Declare @RowsUpdated bigint

    Declare @PrevRowsUpdated bigint

    Select @RowsUpdated = -1

    , @PrevRowsUpdated = 0

    While @RowsUpdated <> 0

    begin

    Update R1

    Set [Stop] = case when R2.[Stop] < R1.[Stop] then R1.[Stop] else R2.[Stop] end

    , ID2 = case when R2.[Stop] < R1.[Stop] then R1.ID2 else R2.ID2 end

    -- output deleted.*, inserted.*, R2.*

    from #TheRanges R1

    inner join #TheRanges R2

    on R1.Start < R2.Start

    and ( datediff(d, R1.[Stop], R2.[Start]) < 2

    or R2.start < R1.[Stop] )

    and R1.ID <> R2.ID

    and R1.ID2 <> R2.ID2

    Select @RowsUpdated = @@rowcount

    if @PrevRowsUpdated = @RowsUpdated

    break

    else set @PrevRowsUpdated = @RowsUpdated

    end

    delete R1

    from #TheRanges R1

    where exists ( Select *

    from #TheRanges R2

    where R1.Start >= R2.Start

    and R1.[Stop] <= R2.[Stop]

    and R1.ID <> R2.ID)

    Select *

    from #TheRanges

    order by start, [stop]

    Select R1.Start

    , R1.[Stop]

    , sum(T.v1) as Sum_v1

    , sum(T.v2) as Sum_v2

    from #TheData T

    inner join #TheRanges R1

    on T.Start between R1.Start and R1.[Stop]

    and T.stop between R1.Start and R1.[Stop]

    group by R1.Start

    , R1.[Stop]

    order by R1.Start

    go

    print convert(char(26), getdate(),121)

    go

    Resulting in

    Start Stop Sum_v1 Sum_v2

    ----------------------- ----------------------- ----------- -----------

    2009-01-01 00:00:00.000 2009-01-31 00:00:00.000 1399 2499

    2009-02-18 00:00:00.000 2009-03-31 00:00:00.000 4347 7677

    2009-04-15 00:00:00.000 2009-07-28 00:00:00.000 7944 11644

    2009-08-30 00:00:00.000 2009-10-13 00:00:00.000 2498 3598

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply