Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

Consolidate Overlapping Date Periods Expand / Collapse
Author
Message
Posted Sunday, September 30, 2012 1:24 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: 2 days ago @ 7:23 AM
Points: 5,074, Visits: 11,852
CELKO (9/27/2012)
First, let's fix your DDL. There is no such thing a “type_id” in RDBMS; read ISO-11170 or any book on data modeling. Would you say “blood_type_id”? Of course not! A type and an id are totally different kinds of attribute properties. But why put it in the skeleton since we do not need it for the problem?

The occupancy_end_date has to be NULL-able to show the guests who are still here (basic temporal idiom in SQL).

--snip


The system from which the data is taken deals with room types (single, double, family with view of wildebeest etc), not individual rooms. These are held in a master table which has an Id column.

The table does not represent room occupancy: it represents room types at hotels which are on, or will be on, stop-sale. This is never an open interval.



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1366237
Posted Sunday, September 30, 2012 1:30 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: 2 days ago @ 7:23 AM
Points: 5,074, Visits: 11,852
Thanks to all who have responded with their ideas. I've been away for a couple of days & will be coding my solution this coming week.

Jeff, I have worked through the examples in the link you provided - excellent stuff, thank you.

I have a 'RoomType' table which I can use as my base table - just need to create a RoomTypeIntervals function to facilitate the CROSS APPLY & see how it performs when compared with the other suggestions.



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1366238
Posted Sunday, September 30, 2012 9:14 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:20 PM
Points: 37,088, Visits: 31,645
CELKO (9/27/2012)
The occupancy_end_date has to be NULL-able to show the guests who are still here (basic temporal idiom in SQL)


That's certainly one way to do it but that typically results in an OR in the JOIN or WHERE clause which can cause performance problems. With that thought in mind, I'll close the interval with a very high date such as 9999-12-30. Notice that I don't use the last date available (9999-12-31) in SQL Server so that I can use the classic and easily SARGable AND EndDate >= CutoffDate AND EndDate < CutoffDate where "CutoffDate" is an "open" end of the interval to accomodate when EndDate has a non-midnight time involved. It also allows for standard handling whether a time is involved or not.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1366262
Posted Sunday, September 30, 2012 11:36 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 10:38 AM
Points: 1,945, Visits: 3,004
That's certainly one way to do it but that typically results in an OR in the JOIN or WHERE clause which can cause performance problems. With that thought in mind, I'll close the interval with a very high date such as 9999-12-30. Notice that I don't use the last date available (9999-12-31) in SQL Server so that I can use the classic and easily SARGable ..


I don't think the performance hit is that bad today; I find I usually write "x_ date BETWEEN start_date AND COALESCE (end_date, CURRENT_TIMESTAMP)" since it preserves the truth. I cannot see the future, but I am still alive today

But more than that, we have DATE data types today, so the old Sybase dialect half-open trick is not needed any more. The sensible, easily read BETWEEN is good now, after all the decades of dialect.


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1366281
Posted Sunday, September 30, 2012 11:39 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 6:49 PM
Points: 3,648, Visits: 5,326
Jeff Moden (9/28/2012)
Most of you know me. If I thought a Tally Table would be the best thing for this problem, I'd be on it like white on rice. The Tally Table and all of it's wonderful different renditions are not the way to go on something like this, IMHO.

I'll also state that I've not done a performance comparision with any of the code offered, so far. I can only state that I've done previous comparisons and I've found one bit of code that, at the time, blew away the competition. That code comes from Itzik Ben-Gan and may be found at the following URL. Like SSC, it requires simple spam-free membership but it's absolute worth the time to sign up.

Here's the URL for "Packing Date Intervals" such as what is being done on this thread.
http://www.solidq.com/sqj/Pages/2011-March-Issue/Packing-Intervals.aspx


I was able to adapt Mr. Ben-Gan's approach to this problem:

-- Method by Itzik Ben-Gan
-- http://www.solidq.com/sqj/Pages/2011-March-Issue/Packing-Intervals.aspx
;WITH C1 AS (
SELECT HotelID, RoomTypeID, ts=StartDate, Type=1, e=NULL
,s=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY StartDate)
FROM #RoomDates
UNION ALL
SELECT HotelID, RoomTypeID, ts=EndDate, Type=-1
,e=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY EndDate)
,s=NULL
FROM #RoomDates),
C2 AS (
SELECT C1.*
,se=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY ts, Type DESC)
FROM C1),
C3 AS (
SELECT HotelID, RoomTypeID, ts
,grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY ts)-1) / 2 + 1)
FROM C2
WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0)
SELECT HotelID, RoomTypeID, StartDate=MIN(ts), EndDate=MAX(ts)
FROM C3
GROUP BY HotelID, RoomTypeID, grpnm


It is very clever for sure.

Testing of the proposed solutions using this test harness:

-- Create a tally table
if object_id('tempdb.dbo.#Nums', 'U') is null
BEGIN
SELECT TOP(1000000) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1
INTO #Nums
FROM sys.columns a, sys.columns b, sys.columns c
CREATE UNIQUE CLUSTERED INDEX UCX_n ON #Nums (n)
END

if object_id('tempdb.dbo.#RoomDates', 'U') is not null
drop table #RoomDates

create table #RoomDates (
HotelId int not null
,RoomTypeId int not null
,StartDate date not null
,EndDate date not null
,constraint [PK_RD1] primary key clustered (
HotelId
,RoomTypeId
,StartDate
) with (
pad_index = off
,statistics_norecompute = off
,ignore_dup_key = off
,allow_row_locks = on
,allow_page_locks = on
) on [PRIMARY]
) on [PRIMARY]


insert #RoomDates ( HotelId
,RoomTypeId
,StartDate
,EndDate)
values (1,1,'2012-01-01', '2012-02-01'), (1,1,'2012-01-10', '2012-02-05'), (1,1,'2012-02-4', '2012-03-01'),
(2,1,'2012-01-01', '2012-02-01'), (2,1,'2012-01-10', '2012-01-20'),
(3,1,'2012-01-01', '2012-02-01'), (3,1,'2012-03-10', '2012-03-20')

INSERT #RoomDates (HotelID, RoomTypeID, StartDate, EndDate)
SELECT 10+n1, n2
,StartDate, EndDate=DATEADD(day, ABS(CHECKSUM(NEWID())) % 20, StartDate)
FROM (SELECT n1=n FROM #Nums WHERE n BETWEEN 1 AND 50) a
CROSS APPLY (SELECT n2=n FROM #Nums WHERE n BETWEEN 1 AND 60) b
CROSS APPLY (SELECT n3=n FROM #Nums WHERE n BETWEEN 1 AND 80) c
CROSS APPLY (SELECT StartDate=DATEADD(day, n3*6, '2009-01-01')) d
--SELECT COUNT(*) FROM #RoomDates

--select * from #RoomDates
DECLARE @HotelID INT, @RoomTypeID INT, @SD DATE, @ED DATE, @StartDT DATETIME, @EndDT DATETIME

--select HotelId
-- ,RoomTypeId
-- ,StartDate
-- ,EndDate
-- from (values (1,1,'2012-01-01', '2012-03-01'),
-- (2,1,'2012-01-01', '2012-02-01'),
-- (3,1,'2012-01-01', '2012-02-01'),
-- (3,1,'2012-03-10', '2012-03-20')) DesiredResults (HotelId, RoomTypeId, StartDate, EndDate)


SET NOCOUNT ON

PRINT '------ bteraberry''s query'
SET STATISTICS TIME ON
select @HotelID=dt.HotelId
, @RoomTypeID=dt.RoomTypeId
, @StartDT = dateadd(dd, min(dt.N), '20000101')
, @EndDT = dateadd(dd, max(dt.N), '20000101')
from
(
select distinct
rd.HotelId
, rd.RoomTypeId
, n.N
, grp = n.N - dense_rank() over(order by rd.HotelId, rd.RoomTypeId, n.N)
from #RoomDates as rd
join #Nums as n
on n.N between datediff(dd, '20000101', rd.StartDate) and datediff(dd, '20000101', rd.EndDate)
) as dt
group by dt.HotelId, dt.RoomTypeId, dt.grp
--order by dt.HotelId, dt.RoomTypeId, min(dt.N);
SET STATISTICS TIME OFF

PRINT '------ Mark''s query'
SET STATISTICS TIME ON
;WITH StartsAndEnds(StartEnd,StartDate,EndDate,HotelId,RoomTypeId) AS (
SELECT 'S' AS StartEnd,
StartDate,
DATEADD(day,-1,StartDate),
HotelId,
RoomTypeId
FROM #RoomDates
UNION ALL
SELECT 'E' AS StartEnd,
DATEADD(day,1,EndDate),
EndDate,
HotelId,
RoomTypeId
FROM #RoomDates),

OrderedStarts AS (
SELECT StartDate,
HotelId,
RoomTypeId,
ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY StartDate,StartEnd DESC) AS rnBoth,
2*(ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId,StartEnd ORDER BY StartDate))-1 AS rnStartEnd
FROM StartsAndEnds),

OrderedEnds AS (
SELECT EndDate,
HotelId,
RoomTypeId,
ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY EndDate DESC,StartEnd) AS rnBothRev,
2*(ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId,StartEnd ORDER BY EndDate DESC))-1 AS rnStartEndRev
FROM StartsAndEnds),

Starts AS (
SELECT StartDate,
HotelId,
RoomTypeId,
ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY StartDate) AS rn
FROM OrderedStarts
WHERE rnBoth=rnStartEnd),

Ends AS (
SELECT EndDate,
HotelId,
RoomTypeId,
ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY EndDate) AS rn
FROM OrderedEnds
WHERE rnBothRev=rnStartEndRev)

SELECT @HotelID=s.HotelId,@RoomTypeID=s.RoomTypeId,@SD=s.StartDate,@ED=e.EndDate
FROM Starts s
INNER JOIN Ends e ON e.HotelId=s.HotelId AND e.RoomTypeId=s.RoomTypeId AND e.rn=s.rn AND s.StartDate<=e.EndDate
--ORDER BY s.HotelId,s.RoomTypeId,s.StartDate,e.EndDate;
SET STATISTICS TIME OFF


PRINT '------ Dwain''s query'
SET STATISTICS TIME ON
;WITH Tally (n) AS (
SELECT TOP (SELECT 1+MAX(DATEDIFF(day, StartDate, EndDate)) FROM #RoomDates)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1
FROM sys.all_columns),
GroupDates AS (
SELECT HotelID, RoomTypeID
,[Date]
,DateGroup = DATEADD(day
, -ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY [Date]), [Date])
FROM #RoomDates
CROSS APPLY (
SELECT [Date]=DATEADD(day, n, StartDate)
FROM Tally
WHERE DATEADD(day, n, StartDate) BETWEEN StartDate AND EndDate
) a
GROUP BY HotelID, RoomTypeID, [Date])
SELECT @HotelID=HotelID, @RoomTypeID=RoomTypeID
,@SD=MIN([Date])
,@ED=MAX([Date])
FROM GroupDates
GROUP BY HotelID, RoomTypeID, DateGroup
SET STATISTICS TIME OFF

PRINT '------ IBG''s query (built by Dwain.C for this case)'
SET STATISTICS TIME ON
;WITH C1 AS (
SELECT HotelID, RoomTypeID, ts=StartDate, Type=1, e=NULL
,s=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY StartDate)
FROM #RoomDates
UNION ALL
SELECT HotelID, RoomTypeID, ts=EndDate, Type=-1
,e=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY EndDate)
,s=NULL
FROM #RoomDates),
C2 AS (
SELECT C1.*
,se=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY ts, Type DESC)
FROM C1),
C3 AS (
SELECT HotelID, RoomTypeID, ts
,grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY ts)-1) / 2 + 1)
FROM C2
WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0)
SELECT @HotelID=HotelID, @RoomTypeID=RoomTypeID, @SD=MIN(ts), @ED=MAX(ts)
FROM C3
GROUP BY HotelID, RoomTypeID, grpnm
SET STATISTICS TIME OFF

PRINT '------ ChrisM''s query'
SET STATISTICS TIME ON
SELECT @HotelID=HotelId, @RoomTypeID=RoomTypeId, @SD = MIN(startdate), @ED = MAX(enddate)
FROM (
SELECT
HotelId, RoomTypeId, startdate, enddate,
Grouper = DATEADD(day,
0-DENSE_RANK() OVER (PARTITION BY HotelId, RoomTypeId ORDER BY InDate),
InDate)
FROM #RoomDates
CROSS APPLY (
SELECT TOP(1+DATEDIFF(DAY,startdate,enddate))
InDate = DATEADD(day,
(ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1),
startdate)
from sys.columns
) x
) c
GROUP BY HotelId, RoomTypeId, Grouper
SET STATISTICS TIME OFF

PRINT '------ bterraberry''s query (2)'
SET STATISTICS TIME ON
;with cteTemp as
(
select HotelId
, RoomTypeId
, BeginDate = case when row_number() over(order by HotelId, RoomTypeId, theDate) - openCnt = 0 then theDate end
, EndDate = case when row_number() over(order by HotelId, RoomTypeId, theDate) - closeCnt = 0 then theDate end
from
(
select HotelId
, RoomTypeId
, theDate = StartDate
, closeCnt = null
, openCnt = (row_number() over(order by HotelId, RoomTypeId, StartDate) *2) - 1
from #RoomDates
union all
select HotelId
, RoomTypeId
, theDate = EndDate
, closeCnt = row_number() over(order by HotelId, RoomTypeId, EndDate) * 2
, openCnt = null
from #RoomDates
) as dt
)

select @HotelID=dt.HotelId
, @RoomTypeID=dt.RoomTypeId
, @SD = min(dt.BeginDate)
, @ED = min(dt.EndDate)
from
(
select HotelId
, RoomTypeId
, BeginDate
, EndDate
, grpID = case when BeginDate is not null then row_number() over(order by HotelId, RoomTypeId, BeginDate)
else row_number() over(order by HotelId, RoomTypeId, EndDate) end
from cteTemp
where BeginDate is not null or EndDate is not null
) as dt
group by dt.HotelId, dt.RoomTypeId, dt.grpID
order by dt.HotelId, dt.RoomTypeId, dt.grpID;

SET STATISTICS TIME OFF
if object_id('tempdb.dbo.#RoomDates', 'U') is not null
drop table #RoomDates
if object_id('tempdb.dbo.#Nums', 'U') is not null
drop table #Nums


Shows that I must have done something right because this solution seems to win out:

------ bteraberry's query

SQL Server Execution Times:
CPU time = 15834 ms, elapsed time = 10126 ms.
------ Mark's query

SQL Server Execution Times:
CPU time = 7051 ms, elapsed time = 4544 ms.
------ Dwain's query

SQL Server Execution Times:
CPU time = 17207 ms, elapsed time = 22379 ms.
------ IBG's query (built by Dwain.C for this case)

SQL Server Execution Times:
CPU time = 718 ms, elapsed time = 756 ms.
------ ChrisM's query

SQL Server Execution Times:
CPU time = 11169 ms, elapsed time = 16211 ms.
------ bterraberry's query (2)

SQL Server Execution Times:
CPU time = 999 ms, elapsed time = 1019 ms.



Bteraberry's second solution comes a close second and it appears he's using elements of the IBG solution.

Edit: Note that IBG also suggested some indexing to improve his solution's speed, which I did not employ here.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1366322
Posted Monday, October 1, 2012 12:00 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 6:49 PM
Points: 3,648, Visits: 5,326
ChrisM@Work (9/28/2012)
dwain.c (9/27/2012)
Phil,

I think this is a pretty snappy query too.

;WITH Tally (n) AS (
SELECT TOP (SELECT 1+MAX(DATEDIFF(day, StartDate, EndDate)) FROM #RoomDates)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1
FROM sys.all_columns),
GroupDates AS (
SELECT HotelID, RoomTypeID
,[Date]
,DateGroup = DATEADD(day
, -ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY [Date]), [Date])
FROM #RoomDates
CROSS APPLY (
SELECT n, [Date]=DATEADD(day, n, StartDate)
FROM Tally
WHERE DATEADD(day, n, StartDate) BETWEEN StartDate AND EndDate) a
GROUP BY HotelID, RoomTypeID, [Date])
SELECT HotelID, RoomTypeID
,StartDate=MIN([Date])
,EndDate=MAX([Date])
FROM GroupDates


This is the first time I've been able to successfully apply Jeff Moden's method for "Grouping Islands of Contiguous Dates" http://www.sqlservercentral.com/articles/T-SQL/71550/!

Not saying I fully understand it but at least I can go through the motions now.

Edit: Eliminated an unnecessary CROSS APPLY.


Jeff's article was the inspiration for this effort too;

SELECT HotelId, RoomTypeId, startdate = MIN(startdate), enddate = MAX(enddate)
FROM (
SELECT
HotelId, RoomTypeId, startdate, enddate,
Grouper = DATEADD(day,
0-DENSE_RANK() OVER (PARTITION BY HotelId, RoomTypeId ORDER BY InDate),
InDate)
FROM #RoomDates
CROSS APPLY (
SELECT TOP(1+DATEDIFF(DAY,startdate,enddate))
InDate = DATEADD(day,
(ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1),
startdate)
from sys.columns
) x
) c
GROUP BY HotelId, RoomTypeId, Grouper



Chris - I like what you did with the CROSS APPLY and I was able to do that in mine (to eliminate the Tally CTE) and it sped up quite dramatically.

Why are you using DENSE_RANK instead of ROW_NUMBER?



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1366324
Posted Monday, October 1, 2012 6:48 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:20 PM
Points: 37,088, Visits: 31,645
CELKO (9/30/2012)
That's certainly one way to do it but that typically results in an OR in the JOIN or WHERE clause which can cause performance problems. With that thought in mind, I'll close the interval with a very high date such as 9999-12-30. Notice that I don't use the last date available (9999-12-31) in SQL Server so that I can use the classic and easily SARGable ..


I don't think the performance hit is that bad today; I find I usually write "x_ date BETWEEN start_date AND COALESCE (end_date, CURRENT_TIMESTAMP)" since it preserves the truth. I cannot see the future, but I am still alive today

But more than that, we have DATE data types today, so the old Sybase dialect half-open trick is not needed any more. The sensible, easily read BETWEEN is good now, after all the decades of dialect.


Between won't work properly if times are involved. And, although I agree that the relatively new DATE datatype is a Godsend for some, I always enjoy it when someone decides they really do want "time" to be a component of the column and their code starts returning whacky numbers after they change the DATE columns to DATETIME columns. Using the method I suggested, that whacky numbers won't happen even for such a change.

I agree that "x_ date BETWEEN start_date AND COALESCE (end_date, CURRENT_TIMESTAMP)" does a reasonable job (including an index seek for the start_date) for problems up until "now" but it doesn't handle outlying date problems for future reservations and the like.



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1366532
Posted Monday, October 1, 2012 9:02 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Friday, September 12, 2014 9:43 AM
Points: 7,284, Visits: 13,824
dwain.c (10/1/2012)
[quote]ChrisM@Work (9/28/2012)
[quote]dwain.c (9/27/2012)
Phil,

I think this is a pretty snappy query too.

<<snip>>

Chris - I like what you did with the CROSS APPLY and I was able to do that in mine (to eliminate the Tally CTE) and it sped up quite dramatically.


Thanks Dwain. It took a while to figure out.

Why are you using DENSE_RANK instead of ROW_NUMBER?

Because my first test harness used a PK which excluded the startdate - it was a much more comprehensive test with different types of overlap.



dwain.c (9/30/2012)


<<snip>>

I was able to adapt Mr. Ben-Gan's approach to this problem:

<<snip>>

Shows that I must have done something right because this solution seems to win out:

------ bteraberry's query

SQL Server Execution Times:
CPU time = 15834 ms, elapsed time = 10126 ms.
------ Mark's query

SQL Server Execution Times:
CPU time = 7051 ms, elapsed time = 4544 ms.
------ Dwain's query

SQL Server Execution Times:
CPU time = 17207 ms, elapsed time = 22379 ms.
------ IBG's query (built by Dwain.C for this case)

SQL Server Execution Times:
CPU time = 718 ms, elapsed time = 756 ms.
------ ChrisM's query

SQL Server Execution Times:
CPU time = 11169 ms, elapsed time = 16211 ms.
------ bterraberry's query (2)

SQL Server Execution Times:
CPU time = 999 ms, elapsed time = 1019 ms.







Not for long 

------ bteraberry's query

SQL Server Execution Times:
CPU time = 14914 ms, elapsed time = 11965 ms.

------ Mark's query

SQL Server Execution Times:
CPU time = 6739 ms, elapsed time = 5010 ms.

------ Dwain's query

SQL Server Execution Times:
CPU time = 17534 ms, elapsed time = 23217 ms.

------ IBG's query (built by Dwain.C for this case)

SQL Server Execution Times:
CPU time = 936 ms, elapsed time = 1005 ms.

------ IBG's query (built by ChrisM for this case)

SQL Server Execution Times:
CPU time = 952 ms, elapsed time = 949 ms.

------ ChrisM's query

SQL Server Execution Times:
CPU time = 11903 ms, elapsed time = 13277 ms.

------ bterraberry's query (2)

SQL Server Execution Times:
CPU time = 1326 ms, elapsed time = 1333 ms.


It uses a faster fish:
;WITH CTE AS (
SELECT HotelID, RoomTypeID, ts=StartDate, Type = 1
,e = NULL
,s = (2*ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY StartDate))-1
FROM #RoomDates
UNION ALL
SELECT HotelID, RoomTypeID, ts=EndDate, Type = -1
,e = (2*ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY EndDate))
,s = NULL
FROM #RoomDates
)
SELECT @HotelID=HotelID, @RoomTypeID=RoomTypeID, @SD = MIN(ts), @ED = MAX(ts)
FROM (
SELECT HotelID, RoomTypeID, ts,
Snapper = (ROW_NUMBER() OVER(ORDER BY HotelID, RoomTypeID, ts)+1)/2
FROM (
SELECT HotelID, RoomTypeID, ts, s, e,
se = ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY ts, Type DESC)
FROM CTE
) c1
WHERE se IN (e,s)
) C3
GROUP BY HotelID, RoomTypeID, Snapper
ORDER BY HotelID, RoomTypeID, Snapper



“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1366612
Posted Monday, October 1, 2012 9:12 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 6:49 PM
Points: 3,648, Visits: 5,326
Looks darn near a tie to me, but if you're buying the beers on our next fishing trip, I'll concede to your snapper.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1366614
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse