count timespans without overlaps

  • I have this test table:

    row 1: start_date='2016-1-1 10:00:00' end_date='2016-1-1 15:00:00'

    row 2: start_date='2016-1-1 13:00:00' end_date='2016-1-1 20:00:00'

    In need to know the SUM of DATEDIFF of all rows, but WITHOUT the overlaps.

    In this case the result should be 10 hours, and not 12 if you should use a SUM+DATEDIFF.

    Anyone know this is possible in SQL?

  • This is based on the packing interval query by Itzik Ben-Gan, I don't have the URL to hand though

    DECLARE @TestTable TABLE(start_date DATETIME, end_date DATETIME);

    INSERT INTO @TestTable(start_date, end_date)

    VALUES('2016-1-1 10:00:00','2016-1-1 15:00:00'),

    ('2016-1-1 13:00:00','2016-1-1 20:00:00');

    --('2016-1-2 13:00:00','2016-1-2 14:00:00'),

    --('2016-1-2 14:00:00','2016-1-2 15:00:00'),

    --('2016-1-2 14:00:00','2016-1-2 20:00:00');

    WITH CTE1 AS (

    SELECT ts=start_date, Type = 1

    ,e = NULL

    ,s = (2*ROW_NUMBER() OVER (ORDER BY start_date))-1

    FROM @TestTable

    UNION ALL

    SELECT ts=end_date, Type = -1

    ,e = (2*ROW_NUMBER() OVER (ORDER BY end_date))

    ,s = NULL

    FROM @TestTable

    ),

    CTE2(minval,maxval) AS (

    SELECT MIN(ts),MAX(ts)

    FROM (

    SELECT ts,

    diff = (ROW_NUMBER() OVER(ORDER BY ts)+1)/2

    FROM (

    SELECT ts, s, e,

    se = ROW_NUMBER() OVER (ORDER BY ts, Type DESC)

    FROM CTE1

    ) c1

    WHERE se IN (e,s)

    ) C3

    GROUP BY diff)

    SELECT totalhours = SUM(DATEDIFF(Hour,minval,maxval))

    FROM CTE2;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • This works just fine, thank you!

  • Here is an updated version that only requires a single scan of the table. There is a complete description of the logic at New Solution to the Packing Intervals Problem.

    WITH C1 AS (

    SELECT start_date, end_date, MAX(end_date) OVER(ORDER BY start_date, end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev_end_date

    FROM @TestTable

    )

    , C2 AS (

    SELECT start_date, end_date, SUM(is_start) OVER(ORDER BY start_date, end_date ROWS UNBOUNDED PRECEDING) AS grp

    FROM C1

    CROSS APPLY ( VALUES( CASE WHEN C1.start_date <= C1.prev_end_date THEN NULL ELSE 1 END )) AS a(is_start)

    )

    SELECT DATEDIFF(HOUR, MIN(C2.start_date), MAX(C2.end_date))

    FROM C2

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (10/7/2016)


    Here is an updated version that only requires a single scan of the table. There is a complete description of the logic at New Solution to the Packing Intervals Problem.

    WITH C1 AS (

    SELECT start_date, end_date, MAX(end_date) OVER(ORDER BY start_date, end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev_end_date

    FROM @TestTable

    )

    , C2 AS (

    SELECT start_date, end_date, SUM(is_start) OVER(ORDER BY start_date, end_date ROWS UNBOUNDED PRECEDING) AS grp

    FROM C1

    CROSS APPLY ( VALUES( CASE WHEN C1.start_date <= C1.prev_end_date THEN NULL ELSE 1 END )) AS a(is_start)

    )

    SELECT DATEDIFF(HOUR, MIN(C2.start_date), MAX(C2.end_date))

    FROM C2

    Drew

    Based on the OP requirements that solution doesn't work. Test it out by including all of the values that were commented out. You'll notice Mark's solution correctly returns 17 hours. while yours returns 34.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • Y.B. (10/7/2016)


    drew.allen (10/7/2016)


    Here is an updated version that only requires a single scan of the table. There is a complete description of the logic at New Solution to the Packing Intervals Problem.

    WITH C1 AS (

    SELECT start_date, end_date, MAX(end_date) OVER(ORDER BY start_date, end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev_end_date

    FROM @TestTable

    )

    , C2 AS (

    SELECT start_date, end_date, SUM(is_start) OVER(ORDER BY start_date, end_date ROWS UNBOUNDED PRECEDING) AS grp

    FROM C1

    CROSS APPLY ( VALUES( CASE WHEN C1.start_date <= C1.prev_end_date THEN NULL ELSE 1 END )) AS a(is_start)

    )

    SELECT DATEDIFF(HOUR, MIN(C2.start_date), MAX(C2.end_date))

    FROM C2

    Drew

    Based on the OP requirements that solution doesn't work. Test it out by including all of the values that were commented out. You'll notice Mark's solution correctly returns 17 hours. while yours returns 34.

    Yeah, I would have caught that if there had been a larger sample size. Here's the fix.

    WITH C1 AS (

    SELECT start_date, end_date, MAX(end_date) OVER(ORDER BY start_date, end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev_end_date

    FROM @TestTable

    )

    , C2 AS (

    SELECT start_date, end_date, SUM(is_start) OVER(ORDER BY start_date, end_date ROWS UNBOUNDED PRECEDING) AS grp

    FROM C1

    CROSS APPLY ( VALUES( CASE WHEN C1.start_date <= C1.prev_end_date THEN NULL ELSE 1 END )) AS a(is_start)

    )

    , C3 AS (

    SELECT DATEDIFF(HOUR, MIN(C2.start_date), MAX(C2.end_date)) AS grp_total

    FROM C2

    GROUP BY grp

    )

    SELECT SUM(grp_total)

    FROM C3

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • The overlap problem can be solved in a much simpler manner, simply check if the current end is greater than the next start and if so use the next start otherwise the current end.

    😎

    USE TEEST;

    GO

    SET NOCOUNT ON;

    DECLARE @TestTable TABLE(start_date DATETIME NOT NULL PRIMARY KEY CLUSTERED, end_date DATETIME);

    INSERT INTO @TestTable(start_date, end_date)

    VALUES('2016-1-1 10:00:00','2016-1-1 15:00:00'),

    ('2016-1-1 13:00:00','2016-1-1 20:00:00'),

    ('2016-1-2 13:00:00','2016-1-2 14:00:00'),

    ('2016-1-2 14:00:00','2016-1-2 15:00:00'),

    ('2016-1-2 15:00:00','2016-1-2 20:00:00');

    ;WITH BASE_DATA AS

    (

    SELECT

    TT.start_date

    ,TT.end_date

    ,LEAD(TT.start_date,1,TT.end_date) OVER

    (

    ORDER BY TT.start_date ASC

    ) AS NEXT_START

    FROM @TestTable TT

    )

    SELECT

    SUM(DATEDIFF(HOUR,BD.start_date

    ,CASE

    WHEN BD.end_date > BD.NEXT_START THEN BD.NEXT_START

    ELSE BD.end_date

    END)) AS SUM_HOUR

    FROM BASE_DATA BD;

    Pretty simple execution plan

  • This is not correct. If I insert the line

    ('2016-1-2 14:10:00','2016-1-2 14:20:00'),

    into your sample rows the result changes, and this should not be the case as this timespan is already in the other rows...

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply