Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Consolidate Overlapping Date Periods Expand / Collapse
Author
Message
Posted Wednesday, September 26, 2012 11:18 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 8:38 AM
Points: 5,047, Visits: 11,803
Getting late in the day and I could use some support from those who are feeling fresh!

Suppose I have a table of from/to rooms in hotels a bit like this:

  if object_id('tempdb.dbo.#RoomDates', 'U') is not null
drop table #RoomDates

create table #RoomDates (
HotelId int not null
,RoomTypeId int not null
,StartDate date not null
,EndDate date not null
,constraint [PK_tempRoom] primary key clustered (
HotelId
,RoomTypeId
,StartDate
) with (
pad_index = off
,statistics_norecompute = off
,ignore_dup_key = off
,allow_row_locks = on
,allow_page_locks = on
) on [PRIMARY]
) on [PRIMARY]


insert #RoomDates ( HotelId
,RoomTypeId
,StartDate
,EndDate)
values (1,1,'2012-01-01', '2012-02-01'), (1,1,'2012-01-10', '2012-02-05'), (1,1,'2012-02-4', '2012-03-01'),
(2,1,'2012-01-01', '2012-02-01'), (2,1,'2012-01-10', '2012-01-20'),
(3,1,'2012-01-01', '2012-02-01'), (3,1,'2012-03-10', '2012-03-20')

select * from #RoomDates

And I want to consolidate the data in the table so that any overlapping date periods are consolidated into one row. My desired results:

    select HotelId
,RoomTypeId
,StartDate
,EndDate
from (values (1,1,'2012-01-01', '2012-03-01'),
(2,1,'2012-01-01', '2012-02-01'),
(3,1,'2012-01-01', '2012-02-01'),
(3,1,'2012-03-10', '2012-03-20')) DesiredResults (HotelId, RoomTypeId, StartDate, EndDate)


Can anyone write a nice snappy query to do it?

Notes
1) Non-overlapping date periods should not be consolidated.
2) Consolidation is to be performed at HotelId, RoomTypeId level.

Thanks v much!








Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1364849
Posted Wednesday, September 26, 2012 1:55 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, December 4, 2013 1:49 PM
Points: 1,104, Visits: 1,174
This solution uses a numbers table in [master].

select	dt.HotelId
, dt.RoomTypeId
, BeginRange = dateadd(dd, min(dt.N), '20000101')
, EndRange = dateadd(dd, max(dt.N), '20000101')
from
(
select distinct
rd.HotelId
, rd.RoomTypeId
, n.N
, grp = n.N - dense_rank() over(order by rd.HotelId, rd.RoomTypeId, n.N)
from #RoomDates as rd
join [master].dbo.Nums as n
on n.N between datediff(dd, '20000101', rd.StartDate) and datediff(dd, '20000101', rd.EndDate)
) as dt
group by dt.HotelId, dt.RoomTypeId, dt.grp
order by dt.HotelId, dt.RoomTypeId, min(dt.N);



└> bt


Forum Etiquette: How to post data/code on a forum to get the best help
Post #1364914
Posted Wednesday, September 26, 2012 3:24 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, April 8, 2014 6:13 AM
Points: 1,694, Visits: 19,550

Here's a snappy query for you

WITH StartsAndEnds(StartEnd,StartDate,EndDate,HotelId,RoomTypeId) AS (
SELECT 'S' AS StartEnd,
StartDate,
DATEADD(day,-1,StartDate),
HotelId,
RoomTypeId
FROM #RoomDates
UNION ALL
SELECT 'E' AS StartEnd,
DATEADD(day,1,EndDate),
EndDate,
HotelId,
RoomTypeId
FROM #RoomDates),

OrderedStarts AS (
SELECT StartDate,
HotelId,
RoomTypeId,
ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY StartDate,StartEnd DESC) AS rnBoth,
2*(ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId,StartEnd ORDER BY StartDate))-1 AS rnStartEnd
FROM StartsAndEnds),

OrderedEnds AS (
SELECT EndDate,
HotelId,
RoomTypeId,
ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY EndDate DESC,StartEnd) AS rnBothRev,
2*(ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId,StartEnd ORDER BY EndDate DESC))-1 AS rnStartEndRev
FROM StartsAndEnds),

Starts AS (
SELECT StartDate,
HotelId,
RoomTypeId,
ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY StartDate) AS rn
FROM OrderedStarts
WHERE rnBoth=rnStartEnd),

Ends AS (
SELECT EndDate,
HotelId,
RoomTypeId,
ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY EndDate) AS rn
FROM OrderedEnds
WHERE rnBothRev=rnStartEndRev)

SELECT s.HotelId,s.RoomTypeId,s.StartDate,e.EndDate
FROM Starts s
INNER JOIN Ends e ON e.HotelId=s.HotelId AND e.RoomTypeId=s.RoomTypeId AND e.rn=s.rn AND s.StartDate<=e.EndDate
ORDER BY s.HotelId,s.RoomTypeId,s.StartDate,e.EndDate;



____________________________________________________

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Never approach a goat from the front, a horse from the rear, or a fool from any direction.
Post #1364950
Posted Wednesday, September 26, 2012 5:07 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, December 4, 2013 1:49 PM
Points: 1,104, Visits: 1,174
Yeah, Mark's is much better. I went to lunch and realized that the much cleaner query had eluded me.

└> bt


Forum Etiquette: How to post data/code on a forum to get the best help
Post #1364976
Posted Thursday, September 27, 2012 1:55 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Yesterday @ 3:18 PM
Points: 1,006, Visits: 3,031
Hi

I thought that I would try something out of left field with this as a matter of interest. I use Geometry a lot and thought it would be interesting to use it for something different.

Anyway here's what I came up with ... it will not be as quick as others though

  if object_id('tempdb.dbo.#RoomDates', 'U') is not null
drop table #RoomDates

create table #RoomDates (
HotelId int not null
,RoomTypeId int not null
,StartDate date not null
,EndDate date not null
,dateGraph Geometry
,constraint [PK_tempRoom] primary key clustered (
HotelId
,RoomTypeId
,StartDate
) with (
pad_index = off
,statistics_norecompute = off
,ignore_dup_key = off
,allow_row_locks = on
,allow_page_locks = on
) on [PRIMARY]
) on [PRIMARY]

insert #RoomDates ( HotelId
,RoomTypeId
,StartDate
,EndDate)
values (1,1,'2012-01-01', '2012-02-01'),
(1,1,'2012-01-10', '2012-02-05'),
(1,1,'2012-02-4', '2012-03-01'),
(1,1,'2012-03-10', '2012-03-13'),
(1,1,'2012-03-11', '2012-03-20'),
(2,1,'2012-01-01', '2012-02-01'),
(2,1,'2012-01-10', '2012-01-20'),
(3,1,'2012-01-01', '2012-02-01'),
(1,1,'2012-02-27', '2012-03-05'),
(3,1,'2012-03-10', '2012-03-20')

update #RoomDates
set dateGraph = Geometry::STGeomFromText('LINESTRING (' +
cast(cast(cast(startdate as datetime) as int) as varchar) + ' ' + cast(roomtypeid + hotelid as varchar) + ', ' +
cast(cast(cast(enddate as datetime) as int) as varchar) + ' ' + cast(roomtypeid + hotelid as varchar) + ')'
,0)

;with
overlap as (
select a.hotelid, a.roomtypeid, a.startdate, a.enddate, b.enddate maxenddate
from #RoomDates a
left join #RoomDates b on
a.hotelid = b.hotelid and
a.roomtypeid = b.roomtypeid and
a.startdate < b.startdate and
a.dategraph.STIntersects(b.dategraph) = 1
),
grouper as (
select row_number() over (order by hotelid, roomtypeid, startdate) groupid,
hotelid, roomtypeid, startdate, enddate
from overlap a
where maxenddate is null
union all
select g.groupid, g.hotelid, g.roomtypeid, o.startdate, o.enddate
from grouper g
inner join overlap o on g.enddate = o.maxenddate and g.hotelid = o.hotelid and g.roomtypeid = o.hotelid
),
merger as (
select hotelid, roomtypeid, min(startdate) startdate, max(enddate) enddate
from grouper
group by groupid, hotelid, roomtypeid
)
select * from merger

Post #1365489
Posted Thursday, September 27, 2012 6:55 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 12:57 AM
Points: 3,641, Visits: 5,290
Phil,

I think this is a pretty snappy query too.

;WITH Tally (n) AS (
SELECT TOP (SELECT 1+MAX(DATEDIFF(day, StartDate, EndDate)) FROM #RoomDates)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1
FROM sys.all_columns),
GroupDates AS (
SELECT HotelID, RoomTypeID
,[Date]
,DateGroup = DATEADD(day
, -ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY [Date]), [Date])
FROM #RoomDates
CROSS APPLY (
SELECT n, [Date]=DATEADD(day, n, StartDate)
FROM Tally
WHERE DATEADD(day, n, StartDate) BETWEEN StartDate AND EndDate) a
GROUP BY HotelID, RoomTypeID, [Date])
SELECT HotelID, RoomTypeID
,StartDate=MIN([Date])
,EndDate=MAX([Date])
FROM GroupDates


This is the first time I've been able to successfully apply Jeff Moden's method for "Grouping Islands of Contiguous Dates" http://www.sqlservercentral.com/articles/T-SQL/71550/!

Not saying I fully understand it but at least I can go through the motions now.

Edit: Eliminated an unnecessary CROSS APPLY.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1365568
Posted Thursday, September 27, 2012 7:36 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 8:08 PM
Points: 1,945, Visits: 2,941
First, let's fix your DDL. There is no such thing a “type_id” in RDBMS; read ISO-11170 or any book on data modeling. Would you say “blood_type_id”? Of course not! A type and an id are totally different kinds of attribute properties. But why put it in the skeleton since we do not need it for the problem?

The occupancy_end_date has to be NULL-able to show the guests who are still here (basic temporal idiom in SQL)



CREATE TABLE Occupancy
(hotel_id INTEGER NOT NULL,
occupancy_start_date DATE NOT NULL,
occupancy_end_date DATE,
CHECK (occupancy_start_date <= occupancy_end_date),
PRIMARY KEY (hotel_id, occupancy_start_date));

INSERT INTO Occupancy
VALUES
(1, '2012-01-01', '2012-02-01'),
(1, '2012-01-10', '2012-02-05'),
(1, '2012-02-04', '2012-03-01'),
(2, '2012-01-01', '2012-02-01'),
(2, '2012-01-10', '2012-01-20'),
(3, '2012-01-01', '2012-02-01'),
(3, '2012-03-10', '2012-03-20');

Can I assume you have a calendar table? There should be a julianized day count starting for the earliest date in the Calender


;WITH X1 (hotel_id, cal_date)
AS
(SELECT DISTINCT O.hotel_id, C.cal_date
FROM Occupancy AS O, Calendar AS C
WHERE C.cal_date
BETWEEN O.occupancy_start_date
AND COALESCE (O.occupancy_end_date, CURRENT_TIMESTAMP)),

C1 (cal_date, julian_date_nbr)
AS
(SELECT cal_date,
ROW_NUMBER() OVER (ORDER BY cal_date)
FROM Calendar),

X2 (hotel_id, cal_date, stay_grp)
AS
(SELECT X1.hotel_id, X1.cal_date,
(C1.julian_date_nbr -
ROW_NUMBER()
OVER (PARTITION BY X1.hotel_id ORDER BY X1.cal_date))
FROM X1, C1
WHERE X1.cal_date = C1.cal_date)

SELECT X2.hotel_id, MIN(cal_date) AS arrival_date, MAX(cal_date) AS departure_date
FROM X2
GROUP BY X2.hotel_id, X2.stay_grp;


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1365573
Posted Friday, September 28, 2012 7:21 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 9:49 AM
Points: 7,234, Visits: 13,719
dwain.c (9/27/2012)
Phil,

I think this is a pretty snappy query too.

;WITH Tally (n) AS (
SELECT TOP (SELECT 1+MAX(DATEDIFF(day, StartDate, EndDate)) FROM #RoomDates)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1
FROM sys.all_columns),
GroupDates AS (
SELECT HotelID, RoomTypeID
,[Date]
,DateGroup = DATEADD(day
, -ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY [Date]), [Date])
FROM #RoomDates
CROSS APPLY (
SELECT n, [Date]=DATEADD(day, n, StartDate)
FROM Tally
WHERE DATEADD(day, n, StartDate) BETWEEN StartDate AND EndDate) a
GROUP BY HotelID, RoomTypeID, [Date])
SELECT HotelID, RoomTypeID
,StartDate=MIN([Date])
,EndDate=MAX([Date])
FROM GroupDates


This is the first time I've been able to successfully apply Jeff Moden's method for "Grouping Islands of Contiguous Dates" http://www.sqlservercentral.com/articles/T-SQL/71550/!

Not saying I fully understand it but at least I can go through the motions now.

Edit: Eliminated an unnecessary CROSS APPLY.


Jeff's article was the inspiration for this effort too;

SELECT HotelId, RoomTypeId, startdate = MIN(startdate), enddate = MAX(enddate)
FROM (
SELECT
HotelId, RoomTypeId, startdate, enddate,
Grouper = DATEADD(day,
0-DENSE_RANK() OVER (PARTITION BY HotelId, RoomTypeId ORDER BY InDate),
InDate)
FROM #RoomDates
CROSS APPLY (
SELECT TOP(1+DATEDIFF(DAY,startdate,enddate))
InDate = DATEADD(day,
(ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1),
startdate)
from sys.columns
) x
) c
GROUP BY HotelId, RoomTypeId, Grouper



“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1365856
Posted Friday, September 28, 2012 1:41 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, December 4, 2013 1:49 PM
Points: 1,104, Visits: 1,174
The first solution I posted above (the first reply to Phil) is a clean "islands" style approach that I don't believe is improved upon by the other similar approaches. However, after posting, I realized that the traditional islands strategy is not desirable in this scenario because of performance considerations. When I got back from lunch I went to correct my solution and I saw that Mark-101232 had already hit on my line of thinking. I believe his offering is clearly the best so far in this thread.

The problem with any islands approach is that the requirement exists to join each record to N number of records in another table (whether virtual or actual), thus expanding the result set and then performing additional work to aggregate and reduce it back down. Now in this case, it may seem irrelevant because:

a) there is very little data we're dealing with;
b) we are using the date data type, so each N represents a full day; and
c) the date ranges are small.

Suppose these three facts were different. If we were dealing with a million records ... and a precision of a minute rather than a full day ... and if we had larger date ranges, the islands strategy would suffer greatly. And the thing is, any join/apply to any other table is completely unnecessary to solve this problem. A non-join approach will perform better both in small and large scales.

Mark's solution may look confusing, but it's basically this: if you take all the dates (both start and end) and put them together in order of date you have a sequence. Think of a begin as being +1 and an end as a -1. Thus, when you get back to 0 (as many ends as beginnings), there is the end of your island. Now, we're not actually solving it that way because we don't need the mess of a running total, but that same logical approach can be replicated with window functions.

The big advantage is that we don't have to multiple the number of records out to potentially huge counts and then do the sorts (which don't scale well) to aggregate back down to our result. The precision of the date is entirely irrelevant so it will work equally as well with a datetime2 as a date. Below I have included how I would write the query, although it is essentially the same logic employed in Mark's solution.

with cteTemp as
(
select HotelId
, RoomTypeId
, BeginDate = case when row_number() over(order by HotelId, RoomTypeId, theDate) - openCnt = 0 then theDate end
, EndDate = case when row_number() over(order by HotelId, RoomTypeId, theDate) - closeCnt = 0 then theDate end
from
(
select HotelId
, RoomTypeId
, theDate = StartDate
, closeCnt = null
, openCnt = (row_number() over(order by HotelId, RoomTypeId, StartDate) *2) - 1
from #RoomDates
union all
select HotelId
, RoomTypeId
, theDate = EndDate
, closeCnt = row_number() over(order by HotelId, RoomTypeId, EndDate) * 2
, openCnt = null
from #RoomDates
) as dt
)

select dt.HotelId
, dt.RoomTypeId
, BeginDate = min(dt.BeginDate)
, EndDate = min(dt.EndDate)
from
(
select HotelId
, RoomTypeId
, BeginDate
, EndDate
, grpID = case when BeginDate is not null then row_number() over(order by HotelId, RoomTypeId, BeginDate)
else row_number() over(order by HotelId, RoomTypeId, EndDate) end
from cteTemp
where BeginDate is not null or EndDate is not null
) as dt
group by dt.HotelId, dt.RoomTypeId, dt.grpID
order by dt.HotelId, dt.RoomTypeId, dt.grpID;



└> bt


Forum Etiquette: How to post data/code on a forum to get the best help
Post #1366095
Posted Friday, September 28, 2012 7:36 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:36 AM
Points: 36,995, Visits: 31,524
Most of you know me. If I thought a Tally Table would be the best thing for this problem, I'd be on it like white on rice. The Tally Table and all of it's wonderful different renditions are not the way to go on something like this, IMHO.

I'll also state that I've not done a performance comparision with any of the code offered, so far. I can only state that I've done previous comparisons and I've found one bit of code that, at the time, blew away the competition. That code comes from Itzik Ben-Gan and may be found at the following URL. Like SSC, it requires simple spam-free membership but it's absolute worth the time to sign up.

Here's the URL for "Packing Date Intervals" such as what is being done on this thread.
http://www.solidq.com/sqj/Pages/2011-March-Issue/Packing-Intervals.aspx


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1366144
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse