RE: How to extract overlapping date ranges from a table of date ranges

drew.allen

SSC Guru

Points: 76944

February 11, 2016 at 11:39 am

#1858133

I set up some sample data (only one subscriber) and come up with the following:

WITH Subscriptions AS (

SELECT *

FROM (

VALUES

(1, CAST('2014-01-01' AS DATE), CAST('2014-03-01' AS DATE))

,(1, '2014-02-01', '2014-05-01')

,(1, '2014-04-01', '2014-06-01')

,(1, '2014-07-01', '2014-11-01')

,(1, '2014-08-01', '2014-10-01')

,(1, '2014-09-01', '2014-12-01')

,(1, '2015-02-01', '2015-04-01')

,(1, '2015-03-01', '2015-05-01')

,(1, '2015-06-01', '2015-07-01')

) v(subscriber_id, start_dt, end_dt)

)

, subscription_status AS (

SELECT *

, CASE WHEN SUM(v.is_subscribed) OVER(PARTITION BY s.subscriber_id ORDER BY v.dt, v.is_subscribed DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) > 0 THEN 1 ELSE 0 END AS subscribe_status

FROM Subscriptions s

CROSS APPLY (

VALUES

(s.start_dt, 1)

,(s.end_dt, -1)

) v(dt, is_subscribed)

)

, subscription_groups AS (

SELECT ss.subscriber_id, ss.start_dt, ss.end_dt, ss.subscribe_status

, ROW_NUMBER() OVER(PARTITION BY ss.subscriber_id ORDER BY ss.dt, ss.subscribe_status DESC) - ROW_NUMBER() OVER(PARTITION BY ss.subscriber_id, ss.subscribe_status ORDER BY dt) AS subscription_group

FROM subscription_status ss

)

SELECT sg.subscriber_id, MIN(sg.start_dt) AS start_dt, MAX(end_dt) AS end_dt

FROM subscription_groups sg

WHERE sg.subscribe_status = 1

GROUP BY sg.subscriber_id, sg.subscription_group

ORDER BY sg.subscriber_id, start_dt

I haven't done extensive testing on this, so I'm not sure how well it scales.

While the "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" isn't absolutely necessary, it will perform better than the default "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" as long as the number of rows processed is under a certain threshold.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA