Jeff Moden (9/28/2012)
Most of you know me. If I thought a Tally Table would be the best thing for this problem, I'd be on it like white on rice. The Tally Table and all of it's wonderful different renditions are not the way to go on something like this, IMHO.I'll also state that I've not done a performance comparision with any of the code offered, so far. I can only state that I've done previous comparisons and I've found one bit of code that, at the time, blew away the competition. That code comes from Itzik Ben-Gan and may be found at the following URL. Like SSC, it requires simple spam-free membership but it's absolute worth the time to sign up.
Here's the URL for "Packing Date Intervals" such as what is being done on this thread.
http://www.solidq.com/sqj/Pages/2011-March-Issue/Packing-Intervals.aspx
I was able to adapt Mr. Ben-Gan's approach to this problem:
-- Method by Itzik Ben-Gan
-- http://www.solidq.com/sqj/Pages/2011-March-Issue/Packing-Intervals.aspx
;WITH C1 AS (
SELECT HotelID, RoomTypeID, ts=StartDate, Type=1, e=NULL
,s=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY StartDate)
FROM #RoomDates
UNION ALL
SELECT HotelID, RoomTypeID, ts=EndDate, Type=-1
,e=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY EndDate)
,s=NULL
FROM #RoomDates),
C2 AS (
SELECT C1.*
,se=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY ts, Type DESC)
FROM C1),
C3 AS (
SELECT HotelID, RoomTypeID, ts
,grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY ts)-1) / 2 + 1)
FROM C2
WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0)
SELECT HotelID, RoomTypeID, StartDate=MIN(ts), EndDate=MAX(ts)
FROM C3
GROUP BY HotelID, RoomTypeID, grpnm
It is very clever for sure.
Testing of the proposed solutions using this test harness:
-- Create a tally table
if object_id('tempdb.dbo.#Nums', 'U') is null
BEGIN
SELECT TOP(1000000) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1
INTO #Nums
FROM sys.columns a, sys.columns b, sys.columns c
CREATE UNIQUE CLUSTERED INDEX UCX_n ON #Nums (n)
END
if object_id('tempdb.dbo.#RoomDates', 'U') is not null
drop table #RoomDates
create table #RoomDates (
HotelId int not null
,RoomTypeId int not null
,StartDate date not null
,EndDate date not null
,constraint [PK_RD1] primary key clustered (
HotelId
,RoomTypeId
,StartDate
) with (
pad_index = off
,statistics_norecompute = off
,ignore_dup_key = off
,allow_row_locks = on
,allow_page_locks = on
) on [PRIMARY]
) on [PRIMARY]
insert #RoomDates ( HotelId
,RoomTypeId
,StartDate
,EndDate)
values (1,1,'2012-01-01', '2012-02-01'), (1,1,'2012-01-10', '2012-02-05'), (1,1,'2012-02-4', '2012-03-01'),
(2,1,'2012-01-01', '2012-02-01'), (2,1,'2012-01-10', '2012-01-20'),
(3,1,'2012-01-01', '2012-02-01'), (3,1,'2012-03-10', '2012-03-20')
INSERT #RoomDates (HotelID, RoomTypeID, StartDate, EndDate)
SELECT 10+n1, n2
,StartDate, EndDate=DATEADD(day, ABS(CHECKSUM(NEWID())) % 20, StartDate)
FROM (SELECT n1=n FROM #Nums WHERE n BETWEEN 1 AND 50) a
CROSS APPLY (SELECT n2=n FROM #Nums WHERE n BETWEEN 1 AND 60) b
CROSS APPLY (SELECT n3=n FROM #Nums WHERE n BETWEEN 1 AND 80) c
CROSS APPLY (SELECT StartDate=DATEADD(day, n3*6, '2009-01-01')) d
--SELECT COUNT(*) FROM #RoomDates
--select * from #RoomDates
DECLARE @HotelID INT, @RoomTypeID INT, @SD DATE, @ED DATE, @StartDT DATETIME, @EndDT DATETIME
--select HotelId
-- ,RoomTypeId
-- ,StartDate
-- ,EndDate
-- from (values (1,1,'2012-01-01', '2012-03-01'),
-- (2,1,'2012-01-01', '2012-02-01'),
-- (3,1,'2012-01-01', '2012-02-01'),
-- (3,1,'2012-03-10', '2012-03-20')) DesiredResults (HotelId, RoomTypeId, StartDate, EndDate)
SET NOCOUNT ON
PRINT '------ bteraberry''s query'
SET STATISTICS TIME ON
select@HotelID=dt.HotelId
,@RoomTypeID=dt.RoomTypeId
,@StartDT = dateadd(dd, min(dt.N), '20000101')
,@EndDT = dateadd(dd, max(dt.N), '20000101')
from
(
selectdistinct
rd.HotelId
,rd.RoomTypeId
,n.N
,grp = n.N - dense_rank() over(order by rd.HotelId, rd.RoomTypeId, n.N)
from #RoomDates as rd
join #Nums as n
on n.N between datediff(dd, '20000101', rd.StartDate) and datediff(dd, '20000101', rd.EndDate)
) as dt
group by dt.HotelId, dt.RoomTypeId, dt.grp
--order by dt.HotelId, dt.RoomTypeId, min(dt.N);
SET STATISTICS TIME OFF
PRINT '------ Mark''s query'
SET STATISTICS TIME ON
;WITH StartsAndEnds(StartEnd,StartDate,EndDate,HotelId,RoomTypeId) AS (
SELECT 'S' AS StartEnd,
StartDate,
DATEADD(day,-1,StartDate),
HotelId,
RoomTypeId
FROM #RoomDates
UNION ALL
SELECT 'E' AS StartEnd,
DATEADD(day,1,EndDate),
EndDate,
HotelId,
RoomTypeId
FROM #RoomDates),
OrderedStarts AS (
SELECT StartDate,
HotelId,
RoomTypeId,
ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY StartDate,StartEnd DESC) AS rnBoth,
2*(ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId,StartEnd ORDER BY StartDate))-1 AS rnStartEnd
FROM StartsAndEnds),
OrderedEnds AS (
SELECT EndDate,
HotelId,
RoomTypeId,
ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY EndDate DESC,StartEnd) AS rnBothRev,
2*(ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId,StartEnd ORDER BY EndDate DESC))-1 AS rnStartEndRev
FROM StartsAndEnds),
Starts AS (
SELECT StartDate,
HotelId,
RoomTypeId,
ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY StartDate) AS rn
FROM OrderedStarts
WHERE rnBoth=rnStartEnd),
Ends AS (
SELECT EndDate,
HotelId,
RoomTypeId,
ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY EndDate) AS rn
FROM OrderedEnds
WHERE rnBothRev=rnStartEndRev)
SELECT @HotelID=s.HotelId,@RoomTypeID=s.RoomTypeId,@SD=s.StartDate,@ED=e.EndDate
FROM Starts s
INNER JOIN Ends e ON e.HotelId=s.HotelId AND e.RoomTypeId=s.RoomTypeId AND e.rn=s.rn AND s.StartDate<=e.EndDate
--ORDER BY s.HotelId,s.RoomTypeId,s.StartDate,e.EndDate;
SET STATISTICS TIME OFF
PRINT '------ Dwain''s query'
SET STATISTICS TIME ON
;WITH Tally (n) AS (
SELECT TOP (SELECT 1+MAX(DATEDIFF(day, StartDate, EndDate)) FROM #RoomDates)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1
FROM sys.all_columns),
GroupDates AS (
SELECT HotelID, RoomTypeID
,[Date]
,DateGroup = DATEADD(day
, -ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY [Date]), [Date])
FROM #RoomDates
CROSS APPLY (
SELECT [Date]=DATEADD(day, n, StartDate)
FROM Tally
WHERE DATEADD(day, n, StartDate) BETWEEN StartDate AND EndDate
) a
GROUP BY HotelID, RoomTypeID, [Date])
SELECT @HotelID=HotelID, @RoomTypeID=RoomTypeID
,@SD=MIN([Date])
,@ED=MAX([Date])
FROM GroupDates
GROUP BY HotelID, RoomTypeID, DateGroup
SET STATISTICS TIME OFF
PRINT '------ IBG''s query (built by Dwain.C for this case)'
SET STATISTICS TIME ON
;WITH C1 AS (
SELECT HotelID, RoomTypeID, ts=StartDate, Type=1, e=NULL
,s=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY StartDate)
FROM #RoomDates
UNION ALL
SELECT HotelID, RoomTypeID, ts=EndDate, Type=-1
,e=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY EndDate)
,s=NULL
FROM #RoomDates),
C2 AS (
SELECT C1.*
,se=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY ts, Type DESC)
FROM C1),
C3 AS (
SELECT HotelID, RoomTypeID, ts
,grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY ts)-1) / 2 + 1)
FROM C2
WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0)
SELECT @HotelID=HotelID, @RoomTypeID=RoomTypeID, @SD=MIN(ts), @ED=MAX(ts)
FROM C3
GROUP BY HotelID, RoomTypeID, grpnm
SET STATISTICS TIME OFF
PRINT '------ ChrisM''s query'
SET STATISTICS TIME ON
SELECT @HotelID=HotelId, @RoomTypeID=RoomTypeId, @SD = MIN(startdate), @ED = MAX(enddate)
FROM (
SELECT
HotelId, RoomTypeId, startdate, enddate,
Grouper = DATEADD(day,
0-DENSE_RANK() OVER (PARTITION BY HotelId, RoomTypeId ORDER BY InDate),
InDate)
FROM #RoomDates
CROSS APPLY (
SELECT TOP(1+DATEDIFF(DAY,startdate,enddate))
InDate = DATEADD(day,
(ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1),
startdate)
from sys.columns
) x
) c
GROUP BY HotelId, RoomTypeId, Grouper
SET STATISTICS TIME OFF
PRINT '------ bterraberry''s query (2)'
SET STATISTICS TIME ON
;with cteTemp as
(
selectHotelId
,RoomTypeId
,BeginDate = case when row_number() over(order by HotelId, RoomTypeId, theDate) - openCnt = 0 then theDate end
,EndDate = case when row_number() over(order by HotelId, RoomTypeId, theDate) - closeCnt = 0 then theDate end
from
(
selectHotelId
,RoomTypeId
,theDate = StartDate
,closeCnt = null
,openCnt = (row_number() over(order by HotelId, RoomTypeId, StartDate) *2) - 1
from #RoomDates
union all
selectHotelId
,RoomTypeId
,theDate = EndDate
,closeCnt = row_number() over(order by HotelId, RoomTypeId, EndDate) * 2
,openCnt = null
from #RoomDates
) as dt
)
select@HotelID=dt.HotelId
,@RoomTypeID=dt.RoomTypeId
,@SD = min(dt.BeginDate)
,@ED = min(dt.EndDate)
from
(
selectHotelId
,RoomTypeId
,BeginDate
,EndDate
,grpID = case when BeginDate is not null then row_number() over(order by HotelId, RoomTypeId, BeginDate)
else row_number() over(order by HotelId, RoomTypeId, EndDate) end
from cteTemp
where BeginDate is not null or EndDate is not null
) as dt
group by dt.HotelId, dt.RoomTypeId, dt.grpID
order by dt.HotelId, dt.RoomTypeId, dt.grpID;
SET STATISTICS TIME OFF
if object_id('tempdb.dbo.#RoomDates', 'U') is not null
drop table #RoomDates
if object_id('tempdb.dbo.#Nums', 'U') is not null
drop table #Nums
Shows that I must have done something right because this solution seems to win out:
------ bteraberry's query
SQL Server Execution Times:
CPU time = 15834 ms, elapsed time = 10126 ms.
------ Mark's query
SQL Server Execution Times:
CPU time = 7051 ms, elapsed time = 4544 ms.
------ Dwain's query
SQL Server Execution Times:
CPU time = 17207 ms, elapsed time = 22379 ms.
------ IBG's query (built by Dwain.C for this case)
SQL Server Execution Times:
CPU time = 718 ms, elapsed time = 756 ms.
------ ChrisM's query
SQL Server Execution Times:
CPU time = 11169 ms, elapsed time = 16211 ms.
------ bterraberry's query (2)
SQL Server Execution Times:
CPU time = 999 ms, elapsed time = 1019 ms.
Bteraberry's second solution comes a close second and it appears he's using elements of the IBG solution.
Edit: Note that IBG also suggested some indexing to improve his solution's speed, which I did not employ here.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St