count timespans without overlaps

Question

Post reply

count timespans without overlaps

marc.corbeel

Default port

Points: 1484
More actions
October 7, 2016 at 8:45 am

#330338

I have this test table:
row 1: start_date='2016-1-1 10:00:00' end_date='2016-1-1 15:00:00'
row 2: start_date='2016-1-1 13:00:00' end_date='2016-1-1 20:00:00'
In need to know the SUM of DATEDIFF of all rows, but WITHOUT the overlaps.
In this case the result should be 10 hours, and not 12 if you should use a SUM+DATEDIFF.
Anyone know this is possible in SQL?

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Mark Cowne One Orange Chip Points: 26944 More actions · Answer 1

This is based on the packing interval query by Itzik Ben-Gan, I don't have the URL to hand though

DECLARE @TestTable TABLE(start_date DATETIME, end_date DATETIME);

INSERT INTO @TestTable(start_date, end_date)

VALUES('2016-1-1 10:00:00','2016-1-1 15:00:00'),

('2016-1-1 13:00:00','2016-1-1 20:00:00');

--('2016-1-2 13:00:00','2016-1-2 14:00:00'),

--('2016-1-2 14:00:00','2016-1-2 15:00:00'),

--('2016-1-2 14:00:00','2016-1-2 20:00:00');

WITH CTE1 AS (

SELECT ts=start_date, Type = 1

,e = NULL

,s = (2*ROW_NUMBER() OVER (ORDER BY start_date))-1

FROM @TestTable

UNION ALL

SELECT ts=end_date, Type = -1

,e = (2*ROW_NUMBER() OVER (ORDER BY end_date))

,s = NULL

FROM @TestTable

),

CTE2(minval,maxval) AS (

SELECT MIN(ts),MAX(ts)

FROM (

SELECT ts,

diff = (ROW_NUMBER() OVER(ORDER BY ts)+1)/2

FROM (

SELECT ts, s, e,

se = ROW_NUMBER() OVER (ORDER BY ts, Type DESC)

FROM CTE1

) c1

WHERE se IN (e,s)

) C3

GROUP BY diff)

SELECT totalhours = SUM(DATEDIFF(Hour,minval,maxval))

FROM CTE2;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

marc.corbeel Default port Points: 1484 More actions · Answer 2

marc.corbeel

Default port

Points: 1484

October 7, 2016 at 9:29 am

#1905413

This works just fine, thank you!

drew.allen SSC Guru Points: 76945 More actions · Answer 3

Here is an updated version that only requires a single scan of the table. There is a complete description of the logic at New Solution to the Packing Intervals Problem.

WITH C1 AS (

SELECT start_date, end_date, MAX(end_date) OVER(ORDER BY start_date, end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev_end_date

FROM @TestTable

)

, C2 AS (

SELECT start_date, end_date, SUM(is_start) OVER(ORDER BY start_date, end_date ROWS UNBOUNDED PRECEDING) AS grp

FROM C1

CROSS APPLY ( VALUES( CASE WHEN C1.start_date <= C1.prev_end_date THEN NULL ELSE 1 END )) AS a(is_start)

)

SELECT DATEDIFF(HOUR, MIN(C2.start_date), MAX(C2.end_date))

FROM C2

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Y.B. SSChampion Points: 11642 More actions · Answer 4

drew.allen (10/7/2016)
Here is an updated version that only requires a single scan of the table. There is a complete description of the logic at New Solution to the Packing Intervals Problem.
WITH C1 AS (
SELECT start_date, end_date, MAX(end_date) OVER(ORDER BY start_date, end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev_end_date
FROM @TestTable
)
, C2 AS (
SELECT start_date, end_date, SUM(is_start) OVER(ORDER BY start_date, end_date ROWS UNBOUNDED PRECEDING) AS grp
FROM C1
CROSS APPLY ( VALUES( CASE WHEN C1.start_date <= C1.prev_end_date THEN NULL ELSE 1 END )) AS a(is_start)
)
SELECT DATEDIFF(HOUR, MIN(C2.start_date), MAX(C2.end_date))
FROM C2
Drew

Based on the OP requirements that solution doesn't work. Test it out by including all of the values that were commented out. You'll notice Mark's solution correctly returns 17 hours. while yours returns 34.

SELECT quote FROM brain WHERE original = 1
0 rows returned

drew.allen SSC Guru Points: 76945 More actions · Answer 5

Y.B. (10/7/2016)
drew.allen (10/7/2016)
Here is an updated version that only requires a single scan of the table. There is a complete description of the logic at New Solution to the Packing Intervals Problem.
WITH C1 AS (
SELECT start_date, end_date, MAX(end_date) OVER(ORDER BY start_date, end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev_end_date
FROM @TestTable
)
, C2 AS (
SELECT start_date, end_date, SUM(is_start) OVER(ORDER BY start_date, end_date ROWS UNBOUNDED PRECEDING) AS grp
FROM C1
CROSS APPLY ( VALUES( CASE WHEN C1.start_date <= C1.prev_end_date THEN NULL ELSE 1 END )) AS a(is_start)
)
SELECT DATEDIFF(HOUR, MIN(C2.start_date), MAX(C2.end_date))
FROM C2
Drew
Based on the OP requirements that solution doesn't work. Test it out by including all of the values that were commented out. You'll notice Mark's solution correctly returns 17 hours. while yours returns 34.

Yeah, I would have caught that if there had been a larger sample size. Here's the fix.

WITH C1 AS (

SELECT start_date, end_date, MAX(end_date) OVER(ORDER BY start_date, end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev_end_date

FROM @TestTable

)

, C2 AS (

SELECT start_date, end_date, SUM(is_start) OVER(ORDER BY start_date, end_date ROWS UNBOUNDED PRECEDING) AS grp

FROM C1

CROSS APPLY ( VALUES( CASE WHEN C1.start_date <= C1.prev_end_date THEN NULL ELSE 1 END )) AS a(is_start)

)

, C3 AS (

SELECT DATEDIFF(HOUR, MIN(C2.start_date), MAX(C2.end_date)) AS grp_total

FROM C2

GROUP BY grp

)

SELECT SUM(grp_total)

FROM C3

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 6

The overlap problem can be solved in a much simpler manner, simply check if the current end is greater than the next start and if so use the next start otherwise the current end.

😎

USE TEEST;

GO

SET NOCOUNT ON;

DECLARE @TestTable TABLE(start_date DATETIME NOT NULL PRIMARY KEY CLUSTERED, end_date DATETIME);

INSERT INTO @TestTable(start_date, end_date)

VALUES('2016-1-1 10:00:00','2016-1-1 15:00:00'),

('2016-1-1 13:00:00','2016-1-1 20:00:00'),

('2016-1-2 13:00:00','2016-1-2 14:00:00'),

('2016-1-2 14:00:00','2016-1-2 15:00:00'),

('2016-1-2 15:00:00','2016-1-2 20:00:00');

;WITH BASE_DATA AS

(

SELECT

TT.start_date

,TT.end_date

,LEAD(TT.start_date,1,TT.end_date) OVER

(

ORDER BY TT.start_date ASC

) AS NEXT_START

FROM @TestTable TT

)

SELECT

SUM(DATEDIFF(HOUR,BD.start_date

,CASE

WHEN BD.end_date > BD.NEXT_START THEN BD.NEXT_START

ELSE BD.end_date

END)) AS SUM_HOUR

FROM BASE_DATA BD;

Pretty simple execution plan

marc.corbeel Default port Points: 1484 More actions · Answer 7

This is not correct. If I insert the line

('2016-1-2 14:10:00','2016-1-2 14:20:00'),

into your sample rows the result changes, and this should not be the case as this timespan is already in the other rows...