RE: Consolidate Overlapping Date Periods

SSChampion

Points: 11845

September 28, 2012 at 1:41 pm

The first solution I posted above (the first reply to Phil) is a clean "islands" style approach that I don't believe is improved upon by the other similar approaches. However, after posting, I realized that the traditional islands strategy is not desirable in this scenario because of performance considerations. When I got back from lunch I went to correct my solution and I saw that Mark-101232 had already hit on my line of thinking. I believe his offering is clearly the best so far in this thread.

The problem with any islands approach is that the requirement exists to join each record to N number of records in another table (whether virtual or actual), thus expanding the result set and then performing additional work to aggregate and reduce it back down. Now in this case, it may seem irrelevant because:

a) there is very little data we're dealing with;

b) we are using the date data type, so each N represents a full day; and

c) the date ranges are small.

Suppose these three facts were different. If we were dealing with a million records ... and a precision of a minute rather than a full day ... and if we had larger date ranges, the islands strategy would suffer greatly. And the thing is, any join/apply to any other table is completely unnecessary to solve this problem. A non-join approach will perform better both in small and large scales.

Mark's solution may look confusing, but it's basically this: if you take all the dates (both start and end) and put them together in order of date you have a sequence. Think of a begin as being +1 and an end as a -1. Thus, when you get back to 0 (as many ends as beginnings), there is the end of your island. Now, we're not actually solving it that way because we don't need the mess of a running total, but that same logical approach can be replicated with window functions.

The big advantage is that we don't have to multiple the number of records out to potentially huge counts and then do the sorts (which don't scale well) to aggregate back down to our result. The precision of the date is entirely irrelevant so it will work equally as well with a datetime2 as a date. Below I have included how I would write the query, although it is essentially the same logic employed in Mark's solution.

with cteTemp as

(

selectHotelId

,RoomTypeId

,BeginDate = case when row_number() over(order by HotelId, RoomTypeId, theDate) - openCnt = 0 then theDate end

,EndDate = case when row_number() over(order by HotelId, RoomTypeId, theDate) - closeCnt = 0 then theDate end

from

(

selectHotelId

,RoomTypeId

,theDate = StartDate

,closeCnt = null

,openCnt = (row_number() over(order by HotelId, RoomTypeId, StartDate) *2) - 1

from #RoomDates

union all

selectHotelId

,RoomTypeId

,theDate = EndDate

,closeCnt = row_number() over(order by HotelId, RoomTypeId, EndDate) * 2

,openCnt = null

from #RoomDates

) as dt

)

selectdt.HotelId

,dt.RoomTypeId

,BeginDate = min(dt.BeginDate)

,EndDate = min(dt.EndDate)

from

(

selectHotelId

,RoomTypeId

,BeginDate

,EndDate

,grpID = case when BeginDate is not null then row_number() over(order by HotelId, RoomTypeId, BeginDate)

else row_number() over(order by HotelId, RoomTypeId, EndDate) end

from cteTemp

where BeginDate is not null or EndDate is not null

) as dt

group by dt.HotelId, dt.RoomTypeId, dt.grpID

order by dt.HotelId, dt.RoomTypeId, dt.grpID;

└> bt

Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Cookies on SQLServerCentral