Finding gaps within date ranges

Question

Finding gaps within date ranges

kyagi.jo

SSCrazy

Points: 2559
More actions
September 13, 2013 at 9:00 am

#279706

Hi, I have a group of date ranges and wanted to identify all of the date gaps within the ranges, outputting the dates as another date range dataset.
Example dataset SQL below:
CREATE TABLE #test (daterow int identity, obj_id int, datestart DATETIME, dateend DATETIME)
INSERT INTO #test
SELECT 1, '20130428', '20130523'
UNION
SELECT 1, '20130526', '20130823'
UNION
SELECT 1, '20130728', '20130728'
UNION
SELECT 1, '20130826', '20130830'
UNION
SELECT 2, '20130501', '20130515'
UNION
SELECT 2, '20130525', '20130830'
select * from #test
I would expect a dataset to be returned consisting of:
1, 24/05/2013, 25/05/2013
1, 24/08/2013, 25/08/2013
2, 16/05/2013, 24/05/2013
Does anyone have a good way of doing this? I have found a lot of examples of problems where I have just a single date column, and then I find the gaps in between that, but I'm having difficulty finding examples where it works with start and end date columns...
Thanks!

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

busraker Ten Centuries Points: 1116 More actions · Answer 1

I'm not sure where your expected data for "2" is coming from ... but you should be able to expand this if necessary to groups

This is quick and dirty but it shows the logic. A recursive CTE was not used to create the date library due to the possibility of large date spans

/* Create a Table of all Possible Dates fro mthe First Start to the last end */

DECLARE @dateToAdd DATETIME

DECLARE @dateLast DATETIME

DECLARE @tDates TABLE (dateDate DATETIME)

SELECT @dateToAdd = MIN(datestart), @dateLast = MAX(dateEnd) FROM #test

WHILE @dateToAdd <= @dateLast

BEGIN

INSERT INTO @tDates SELECT @dateToAdd

SET @dateToadd = @dateToadd + 1

END

/* Find All Dates Not in a range */

WITH CTEMissingDates AS

(SELECT

dateDate

FROM

@tDates t1

WHERE

NOT EXISTS (SELECT daterow FROM #test WHERE t1.dateDate >= datestart AND t1.dateDate <= dateend)

),

/* The start date of the new range will be any without a date before it */

CTEMissingRange1 AS

(

SELECT

d1.dateDate

FROM

CTEMissingDates d1

LEFT JOIN CTEMissingDates d2 ON DATEADD(dd,-1,d1.dateDate) = d2.dateDate

WHERE

d2.dateDate IS NULL)

/* considering all of the mussing dates after the range start, the end of the range is the day before the first next found date */

SELECT

r1.dateDate dtStart,

DATEADD(d,-1,MIN(d.dateDate)) dtEnd

FROM

CTEMissingRange1 r1

INNER JOIN @tDates d ON r1.dateDate < d.dateDate

WHERE

d.dateDate NOT IN (SELECT dateDate FROM CTEMissingDates)

GROUP BY

r1.dateDate

Hope this helps,

Russ Baker

kyagi.jo SSCrazy Points: 2559 More actions · Answer 2

Sorry, the 2 was to represent that I could potentially have multiple items, and each one would have their own gaps represented, but the updated dataset never left my management studio.

I'll take a look at the example SQL, thanks.

Ron McCullough SSC Guru Points: 63877 More actions · Answer 3

I suggest that you read this article by Jeff Moden

http://www.sqlservercentral.com/articles/T-SQL/71550/

For an excellent article including sample data and the code that will do what you seem to asking how to do.

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

kyagi.jo SSCrazy Points: 2559 More actions · Answer 4

Hi,

I read that article before, but from my understanding, that's more like the inverse of what I was doing, as I want to find the gaps in between the date ranges, while the article was referring to highlighting the "islands" of dates. Is there an easy way to convert one problem into the other?

busraker Ten Centuries Points: 1116 More actions · Answer 5

Start with my code for finding "CTEMissingDates" ... but add a group ID for your groups (1,2, etc)

At that point you can group them into ranges using the technique from that article.

Hunchback SSCrazy Points: 2649 More actions · Answer 6

The approach I would use is first to pack overlapping intervals and then find the gaps.

You can read about both methods in the last book from Itzik Ben-Gan about "High Performance T-SQL using Window Functions".

http://shop.oreilly.com/product/0790145323088.do

SET NOCOUNT ON;

USE tempdb;

GO

CREATE TABLE #test (

daterow int IDENTITY,

obj_id int,

datestart datetime,

dateend datetime

);

INSERT INTO #test

SELECT

1,

'20130428',

'20130523'

UNION

SELECT

1,

'20130526',

'20130823'

UNION

SELECT

1,

'20130728',

'20130728'

UNION

SELECT

1,

'20130826',

'20130830'

UNION

SELECT

2,

'20130501',

'20130515'

UNION

SELECT

2,

'20130525',

'20130830';

-- Itzik's solution for packing intervals previous to SS2012

WITH C1 AS (

SELECT

daterow,

obj_id,

datestart AS dt,

1 AS [type],

NULL AS e,

ROW_NUMBER() OVER(PARTITION BY obj_id ORDER BY datestart, daterow) AS s

FROM

#test

UNION ALL

SELECT

daterow,

obj_id,

dateend AS dt,

-1 AS [type],

ROW_NUMBER() OVER(PARTITION BY obj_id ORDER BY dateend, daterow) AS e,

NULL AS s

FROM

#test

)

, C2 AS (

SELECT

*,

ROW_NUMBER() OVER(PARTITION BY obj_id ORDER BY dt, [type] DESC, daterow) AS se

FROM

C1

)

, C3 AS (

SELECT

*,

((ROW_NUMBER() OVER(PARTITION BY obj_id ORDER BY dt) - 1) / 2) + 1 AS grpnum

FROM

C2

WHERE

COALESCE(s - (se - s) - 1, (se - e) - e) = 0

)

SELECT

obj_id,

MIN(dt) AS datestart,

MAX(dt) AS dateend

INTO

#T

FROM

C3

GROUP BY

obj_id,

grpnum;

-- Finding gaps

SELECT

A.obj_id,

DATEADD([day], 1, A.dateend) AS dtstart,

DATEADD([day], -1, B.datestart) AS dtend

FROM

#T AS A

CROSS APPLY

(

SELECT TOP (1)

T.datestart

FROM

#T AS T

WHERE

T.obj_id = A.obj_id AND T.datestart > A.dateend

ORDER BY

T.datestart

) AS B

WHERE

DATEDIFF([day], A.dateend, B.datestart) > 1

ORDER BY

A.obj_id,

dtstart;

GO

DROP TABLE #test, #T;

GO

Result:

/*

obj_iddtstartdtend

12013-05-24 00:00:00.0002013-05-25 00:00:00.000

12013-08-24 00:00:00.0002013-08-25 00:00:00.000

22013-05-16 00:00:00.0002013-05-24 00:00:00.000

*/

Hunchback SSCrazy Points: 2649 More actions · Answer 7

Sorry for the formatting but I have no idea how to post T-SQL code.

Dwain Camps SSC Guru Points: 86908 More actions · Answer 8

Like hunchback, I'll start with Mr. Ben-Gan's interval packing approach but I believe the last part diverges from what he proposes:

WITH C1 AS (

SELECT obj_id, ts, Type

,e=CASE Type WHEN 1 THEN NULL ELSE ROW_NUMBER() OVER (PARTITION BY obj_id, Type ORDER BY dateend) END

,s=CASE Type WHEN -1 THEN NULL ELSE ROW_NUMBER() OVER (PARTITION BY obj_id, Type ORDER BY datestart) END

FROM #test

CROSS APPLY (

VALUES (1, datestart), (-1, dateend)) a(Type, ts)

),

C2 AS (

SELECT C1.*

,se=ROW_NUMBER() OVER (PARTITION BY obj_id ORDER BY ts, Type DESC)

FROM C1),

C3 AS (

SELECT obj_id, ts

,grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY obj_id ORDER BY ts)-1) / 2 + 1)

FROM C2

WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0),

-- C1, C2, C3, C4 combined remove the overlapping date periods

C4 AS (

SELECT obj_id, datestart=MIN(ts), dateend=MAX(ts)

FROM C3

GROUP BY obj_id, grpnm)

SELECT obj_id, datestart=MIN(newdate), dateend=MAX(newdate)

FROM (

SELECT obj_id, newdate

,rn=ROW_NUMBER() OVER (PARTITION BY obj_id ORDER BY newdate) / 2

FROM C4 a

CROSS APPLY (

VALUES (datestart-1),(dateend+1)) b(newdate)

) a

GROUP BY obj_id, rn

HAVING COUNT(*) = 2

ORDER BY obj_id, datestart;

1. The first process (combination of CTEs C1-C4) is a way of removing the overlapping date intervals, which I first saw here: Interval Packing by Itzik Ben-Gan. You end up with "islands" of date ranges where the overlaps have been removed.

2. The final step is what I call the CROSS APPLY VALUES approach to Gaps from Islands, and that is described here: The SQL of Gaps and Islands in Sequences[/url]. While this article applies the approach to sequence numbers, it is equally applicable to DATETIME data. This differs a bit from what hunchback proposes (eliminates the temp table) but also works based on one pass through the C4 table.

Should be pretty fast performing too.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86908 More actions · Answer 9

bitbucket-25253 (9/13/2013)
I suggest that you read this article by Jeff Moden
http://www.sqlservercentral.com/articles/T-SQL/71550/
For an excellent article including sample data and the code that will do what you seem to asking how to do.

Ron - That was my initial thought also, however Jeff's technique (also described in SQL MVP Deep Dives) doesn't work when there are date duplicates and is predicated on there being a single date column. The overlapping aspect of the intervals nullifies this approach, so it could not be used to generate the islands from which I calculated the gaps.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86908 More actions · Answer 10

hunchback (9/14/2013)
Sorry for the formatting but I have no idea how to post T-SQL code.

When you reply to the post, look on the left at the IFCode Shortcuts. Use the one that says code="sql"

My edits above were to acknowledge the fact the we both used Mr. Ben-Gan's approach to packing intervals, which I didn't notice initially in the code you posted.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Hunchback SSCrazy Points: 2649 More actions · Answer 11

Hunchback

SSCrazy

Points: 2649

September 16, 2013 at 5:33 am

#1650375

Thanks, Dwain!

--

AMB

pero108 Valued Member Points: 68 More actions · Answer 12

To: dwain.c

For this data:

INSERT INTO #test

SELECT

1,

'20130428',

'20130523'

UNION

SELECT

1,

'20130526',

'20130823'

UNION

SELECT

1,

'20130728',

'20130728'

UNION

SELECT

1,

'20130826',

'20130830'

UNION

SELECT

2,

'20130501',

'20130515'

UNION

SELECT

2,

'20130516',

'20130520'

UNION

SELECT

2,

'20130525',

'20130830';

i got wrong result:

12013-05-24 00:00:00.0002013-05-25 00:00:00.000

12013-08-24 00:00:00.0002013-08-25 00:00:00.000

22013-05-15 00:00:00.0002013-05-16 00:00:00.000

22013-05-21 00:00:00.0002013-05-24 00:00:00.000

there should not be gap between 2013-05-15 and 2013-05-16