Please help - Date Overlapping

Question

Please help - Date Overlapping

tofee498

Grasshopper

Points: 18
More actions
November 11, 2014 at 8:49 am

#292377

Hi All,
I need some help in finding the overlaps in the date range using sql. below is the sample data.
ID StartDate EndDate
id1 20110101 20130930
id1 20131001 20141231
id1 20140101
id2 20110101 20130930
id2 20131001
id3 20110101 20130930
id3 20131001 20131231
id3 20140101
id4 20110101 20130930
id4 20131001 20131231
id5 20131001 20141231
id5 20140101
The result would returns the following overlapping rows:
id1 20131001 20141231
id1 20140101
id5 20131001 20141231
id5 20140101
Thanking in advance.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Jack Corbett SSC Guru Points: 184393 More actions · Answer 1

Looks like this works, but be sure to test especially for performance. Please not how I provided the test data in an easily consumable format so others can just grab it and test:

DECLARE @dates TABLE

(

id CHAR(3),

StartDate DATE,

EndDate DATE

);

INSERT INTO @dates

(id, StartDate, EndDate)

VALUES

('id1', '20110101', '20130930'),

('id1', '20131001', '20141231'),

('id1', '20140101', NULL),

('id2', '20110101', '20130930'),

('id2', '20131001', NULL),

('id3', '20110101', '20130930'),

('id3', '20131001', '20131231'),

('id3', '20140101', NULL),

('id4', '20110101', '20130930'),

('id4', '20131001', '20131231'),

('id5', '20131001', '20141231'),

('id5', '20140101', NULL);

WITH dates

AS (

SELECT

*,

ROW_NUMBER() OVER (PARTITION BY id ORDER BY D.StartDate) AS rowNo

FROM

@dates AS D

)

SELECT

*

FROM

dates AS D

JOIN dates AS D2

ON D.id = D2.id AND

/* don't match self */

D.rowNo <> D2.rowNo AND

(

/* later one over laps earlier one */

D2.StartDate BETWEEN D.StartDate AND D.EndDate OR

/* earlier one overlaps later one */

(

D.StartDate BETWEEN D2.StartDate AND ISNULL(D2.EndDate, '20990101') OR

D.EndDate BETWEEN D2.StartDate AND ISNULL(D2.EndDate, '20990101')

)

);

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

tofee498 Grasshopper Points: 18 More actions · Answer 2

I like to Thank You Mr. Corbett very much for taking the time to read and answer my question. Your help is most appreciated. Your query is too advance for a newbie like me to fully understanding it. I am going to read and study more to fully understand your query.

Thanks again,

Tom.

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 3

Quick solution for educational and entertainment purposes (read "spanner in the works") using slight modification of Jack Corbett's data preparation.

😎

USE tempdb;

GO

SET NOCOUNT ON;

IF OBJECT_ID(N'dbo.TBL_DATES') IS NOT NULL DROP TABLE dbo.TBL_DATES;

CREATE TABLE dbo.TBL_DATES

(

DATE_ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED

,id CHAR(3) NOT NULL

,StartDate DATE NOT NULL

,EndDate DATE

);

INSERT INTO dbo.TBL_DATES

(id, StartDate, EndDate)

VALUES

('id1', '20110101', '20130930'),

('id1', '20131001', '20141231'),

('id1', '20140101', NULL),

('id2', '20110101', '20130930'),

('id2', '20131001', NULL),

('id3', '20110101', '20130930'),

('id3', '20131001', '20131231'),

('id3', '20140101', NULL),

('id4', '20110101', '20130930'),

('id4', '20131001', '20131231'),

('id5', '20131001', '20141231'),

('id5', '20140101', NULL);

;WITH BASE_DATA AS

(

SELECT

ROW_NUMBER() OVER

(

PARTITION BY D.id

ORDER BY ISNULL(D.EndDate,D.StartDate)

) AS END_RID

,ROW_NUMBER() OVER

(

PARTITION BY D.id

ORDER BY D.StartDate

) AS START_RID

,D.id

,D.StartDate

,D.EndDate

FROM dbo.TBL_DATES D

)

,GROUPED_DATA AS

(

SELECT

BD.id

,BD.START_RID + BD.END_RID AS GR_ID

,ROW_NUMBER() OVER

(

PARTITION BY BD.id

ORDER BY BD.START_RID

) AS OL_RID

,BD.StartDate

,BD.EndDate

FROM BASE_DATA BD

WHERE BD.START_RID <> BD.END_RID

)

SELECT

GD.id

,GD.GR_ID

,GD.OL_RID

,GD.StartDate

,GD.EndDate

FROM GROUPED_DATA GD;

Results

id GR_ID OL_RID StartDate EndDate

---- ------ ------- ---------- ----------

id1 5 1 2013-10-01 2014-12-31

id1 5 2 2014-01-01 NULL

id5 3 1 2013-10-01 2014-12-31

id5 3 2 2014-01-01 NULL

Jack Corbett SSC Guru Points: 184393 More actions · Answer 4

Eirikur Eiriksson (11/12/2014)
Quick solution for educational and entertainment purposes (read "spanner in the works") using slight modification of Jack Corbett's data preparation.
😎
USE tempdb;
GO
SET NOCOUNT ON;
IF OBJECT_ID(N'dbo.TBL_DATES') IS NOT NULL DROP TABLE dbo.TBL_DATES;
CREATE TABLE dbo.TBL_DATES
(
DATE_ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED
,id CHAR(3) NOT NULL
,StartDate DATE NOT NULL
,EndDate DATE
);
INSERT INTO dbo.TBL_DATES
(id, StartDate, EndDate)
VALUES
('id1', '20110101', '20130930'),
('id1', '20131001', '20141231'),
('id1', '20140101', NULL),
('id2', '20110101', '20130930'),
('id2', '20131001', NULL),
('id3', '20110101', '20130930'),
('id3', '20131001', '20131231'),
('id3', '20140101', NULL),
('id4', '20110101', '20130930'),
('id4', '20131001', '20131231'),
('id5', '20131001', '20141231'),
('id5', '20140101', NULL);
;WITH BASE_DATA AS
(
SELECT
ROW_NUMBER() OVER
(
PARTITION BY D.id
ORDER BY ISNULL(D.EndDate,D.StartDate)
) AS END_RID
,ROW_NUMBER() OVER
(
PARTITION BY D.id
ORDER BY D.StartDate
) AS START_RID
,D.id
,D.StartDate
,D.EndDate
FROM dbo.TBL_DATES D
)
,GROUPED_DATA AS
(
SELECT
BD.id
,BD.START_RID + BD.END_RID AS GR_ID
,ROW_NUMBER() OVER
(
PARTITION BY BD.id
ORDER BY BD.START_RID
) AS OL_RID
,BD.StartDate
,BD.EndDate
FROM BASE_DATA BD
WHERE BD.START_RID <> BD.END_RID
)
SELECT
GD.id
,GD.GR_ID
,GD.OL_RID
,GD.StartDate
,GD.EndDate
FROM GROUPED_DATA GD;
Results
id GR_ID OL_RID StartDate EndDate
---- ------ ------- ---------- ----------
id1 5 1 2013-10-01 2014-12-31
id1 5 2 2014-01-01 NULL
id5 3 1 2013-10-01 2014-12-31
id5 3 2 2014-01-01 NULL

Defintely a better solution than mine. Thanks Erikur.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

kaj SSCommitted Points: 1580 More actions · Answer 5

Hi,

Both of you seem to forget that this is the 2K5 group (SQL Server 2005), which means that fields of type DATE isn't supported (we have to use DATETIME fields). Also the use of VALUES as a means to insert data isn't supported. Not a big issue, but "newbies" may stumble at that point already.

I don't agree that Eirikur's solution is the better one. It only works with the example data supplied by the OP. As soon as one starts to tweek the data a little his solution falls apart.

I tried both your solutions with a slightly altered input dataset:

USE tempdb;

GO

SET NOCOUNT ON;

IF OBJECT_ID(N'dbo.TBL_DATES') IS NOT NULL DROP TABLE dbo.TBL_DATES;

CREATE TABLE dbo.TBL_DATES

(

DATE_ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED

,id CHAR(3) NOT NULL

,StartDate DATETIME NOT NULL

,EndDate DATETIME

);

INSERT INTO dbo.TBL_DATES

(id, StartDate, EndDate)

SELECT

'id1', '20110101', NULL UNION ALL SELECT -- changed enddate to Null, previously '20130930'

'id1', '20131001', '20141231' UNION ALL SELECT

'id1', '20140101', NULL UNION ALL SELECT

'id2', '20110101', '20130930' UNION ALL SELECT

'id2', '20131001', NULL UNION ALL SELECT

'id3', '20110101', '20130930' UNION ALL SELECT

'id3', '20110301', '20131130' UNION ALL SELECT -- Added row

'id3', '20131001', '20131231' UNION ALL SELECT

'id3', '20140101', NULL UNION ALL SELECT

'id4', '20110101', '20130930' UNION ALL SELECT

'id4', '20131001', '20131231' UNION ALL SELECT

'id5', '20131001', '20141231' UNION ALL SELECT

'id5', '20140101', NULL;

I then ran Eirikur's SQL code, which gave me this result dataset:

[font="Courier New"]

id GR_ID OL_RID StartDate EndDate

---- -------------------- -------------------- ----------------------- -----------------------

id1 5 1 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000

id1 5 2 2014-01-01 00:00:00.000 NULL

id5 3 1 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000

id5 3 2 2014-01-01 00:00:00.000 NULL

[/font]

This is obviously wrong! The changes in the input data are not reflected in the result set. None of the newly introduced overlaps have been caught.

I then ran Jack's code and got this resultset:

[font="Courier New"]

DATE_ID id StartDate EndDate rowNo DATE_ID id StartDate EndDate rowNo

----------- ---- ----------------------- ----------------------- -------------------- ----------- ---- ----------------------- ----------------------- --------------------

2 id1 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000 2 1 id1 2011-01-01 00:00:00.000 NULL 1

3 id1 2014-01-01 00:00:00.000 NULL 3 1 id1 2011-01-01 00:00:00.000 NULL 1

3 id1 2014-01-01 00:00:00.000 NULL 3 2 id1 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000 2

2 id1 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000 2 3 id1 2014-01-01 00:00:00.000 NULL 3

7 id3 2011-03-01 00:00:00.000 2013-11-30 00:00:00.000 2 6 id3 2011-01-01 00:00:00.000 2013-09-30 00:00:00.000 1

6 id3 2011-01-01 00:00:00.000 2013-09-30 00:00:00.000 1 7 id3 2011-03-01 00:00:00.000 2013-11-30 00:00:00.000 2

8 id3 2013-10-01 00:00:00.000 2013-12-31 00:00:00.000 3 7 id3 2011-03-01 00:00:00.000 2013-11-30 00:00:00.000 2

7 id3 2011-03-01 00:00:00.000 2013-11-30 00:00:00.000 2 8 id3 2013-10-01 00:00:00.000 2013-12-31 00:00:00.000 3

13 id5 2014-01-01 00:00:00.000 NULL 2 12 id5 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000 1

12 id5 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000 1 13 id5 2014-01-01 00:00:00.000 NULL 2

[/font]

Now this is better! All the overlapping date combinations have been caught.

But the format of the output data isn't all that logical if one only wants to know what rows/dates overlap. To show only input rows that are involved in overlapping date intervals, one could employ a union as a last step:

WITH dates

AS (

SELECT

*,

ROW_NUMBER() OVER (PARTITION BY id ORDER BY D.StartDate) AS rowNo

FROM

dbo.TBL_DATES AS D

),

overlaps

AS (

SELECT

d.DATE_ID, d.id, d.StartDate, d.EndDate, d.rowNo,

d2.DATE_ID DATE_ID2, d2.id id2, d2.StartDate StartDate2, d2.EndDate EndDate2, d2.rowNo rowNo2

FROM

dates AS D

JOIN dates AS D2

ON D.id = D2.id AND

/* don't match self */

D.rowNo <> D2.rowNo AND

(

/* later one over laps earlier one */

D2.StartDate BETWEEN D.StartDate AND D.EndDate OR

/* earlier one overlaps later one */

(

D.StartDate BETWEEN D2.StartDate AND ISNULL(D2.EndDate, '20990101') OR

D.EndDate BETWEEN D2.StartDate AND ISNULL(D2.EndDate, '20990101')

)

SELECT DISTINCT id, StartDate, EndDate

from overlaps

UNION

SELECT DISTINCT id2, StartDate2, EndDate2

FROM overlaps

ORDER BY id, StartDate, EndDate;

... which gives us:

[font="Courier New"]

id StartDate EndDate

---- ----------------------- -----------------------

id1 2011-01-01 00:00:00.000 NULL

id1 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000

id1 2014-01-01 00:00:00.000 NULL

id3 2011-01-01 00:00:00.000 2013-09-30 00:00:00.000

id3 2011-03-01 00:00:00.000 2013-11-30 00:00:00.000

id3 2013-10-01 00:00:00.000 2013-12-31 00:00:00.000

id5 2013-10-01 00:00:00.000 2014-12-31 00:00:00.000

id5 2014-01-01 00:00:00.000 NULL

[/font]

That about covers it, i think... 🙂

Dwain Camps SSC Guru Points: 86908 More actions · Answer 6

Personally I wouldn't go with a home-grown solution for this "packing intervals" problem.

Instead I'd seek out the literature and find that SQL MVP Itzik Ben-Gan has proposed a very fast and elegant solution: Packing Intervals[/url]

For some reason however, that original link is not currently directing to his article (maybe it's gone into archives), but I used his technique (with attribution of course) in this article: Calculating Gaps Between Overlapping Time Intervals in SQL[/url] (see the section on "Packing the Overlapping Login Intervals" sample query #5).

A quick review of that suggests it should server you well in SQL 2005.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St