• Jeff Moden (9/28/2012)


    Most of you know me. If I thought a Tally Table would be the best thing for this problem, I'd be on it like white on rice. The Tally Table and all of it's wonderful different renditions are not the way to go on something like this, IMHO.

    I'll also state that I've not done a performance comparision with any of the code offered, so far. I can only state that I've done previous comparisons and I've found one bit of code that, at the time, blew away the competition. That code comes from Itzik Ben-Gan and may be found at the following URL. Like SSC, it requires simple spam-free membership but it's absolute worth the time to sign up.

    Here's the URL for "Packing Date Intervals" such as what is being done on this thread.

    http://www.solidq.com/sqj/Pages/2011-March-Issue/Packing-Intervals.aspx

    I was able to adapt Mr. Ben-Gan's approach to this problem:

    -- Method by Itzik Ben-Gan

    -- http://www.solidq.com/sqj/Pages/2011-March-Issue/Packing-Intervals.aspx

    ;WITH C1 AS (

    SELECT HotelID, RoomTypeID, ts=StartDate, Type=1, e=NULL

    ,s=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY StartDate)

    FROM #RoomDates

    UNION ALL

    SELECT HotelID, RoomTypeID, ts=EndDate, Type=-1

    ,e=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY EndDate)

    ,s=NULL

    FROM #RoomDates),

    C2 AS (

    SELECT C1.*

    ,se=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY ts, Type DESC)

    FROM C1),

    C3 AS (

    SELECT HotelID, RoomTypeID, ts

    ,grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY ts)-1) / 2 + 1)

    FROM C2

    WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0)

    SELECT HotelID, RoomTypeID, StartDate=MIN(ts), EndDate=MAX(ts)

    FROM C3

    GROUP BY HotelID, RoomTypeID, grpnm

    It is very clever for sure.

    Testing of the proposed solutions using this test harness:

    -- Create a tally table

    if object_id('tempdb.dbo.#Nums', 'U') is null

    BEGIN

    SELECT TOP(1000000) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1

    INTO #Nums

    FROM sys.columns a, sys.columns b, sys.columns c

    CREATE UNIQUE CLUSTERED INDEX UCX_n ON #Nums (n)

    END

    if object_id('tempdb.dbo.#RoomDates', 'U') is not null

    drop table #RoomDates

    create table #RoomDates (

    HotelId int not null

    ,RoomTypeId int not null

    ,StartDate date not null

    ,EndDate date not null

    ,constraint [PK_RD1] primary key clustered (

    HotelId

    ,RoomTypeId

    ,StartDate

    ) with (

    pad_index = off

    ,statistics_norecompute = off

    ,ignore_dup_key = off

    ,allow_row_locks = on

    ,allow_page_locks = on

    ) on [PRIMARY]

    ) on [PRIMARY]

    insert #RoomDates ( HotelId

    ,RoomTypeId

    ,StartDate

    ,EndDate)

    values (1,1,'2012-01-01', '2012-02-01'), (1,1,'2012-01-10', '2012-02-05'), (1,1,'2012-02-4', '2012-03-01'),

    (2,1,'2012-01-01', '2012-02-01'), (2,1,'2012-01-10', '2012-01-20'),

    (3,1,'2012-01-01', '2012-02-01'), (3,1,'2012-03-10', '2012-03-20')

    INSERT #RoomDates (HotelID, RoomTypeID, StartDate, EndDate)

    SELECT 10+n1, n2

    ,StartDate, EndDate=DATEADD(day, ABS(CHECKSUM(NEWID())) % 20, StartDate)

    FROM (SELECT n1=n FROM #Nums WHERE n BETWEEN 1 AND 50) a

    CROSS APPLY (SELECT n2=n FROM #Nums WHERE n BETWEEN 1 AND 60) b

    CROSS APPLY (SELECT n3=n FROM #Nums WHERE n BETWEEN 1 AND 80) c

    CROSS APPLY (SELECT StartDate=DATEADD(day, n3*6, '2009-01-01')) d

    --SELECT COUNT(*) FROM #RoomDates

    --select * from #RoomDates

    DECLARE @HotelID INT, @RoomTypeID INT, @SD DATE, @ED DATE, @StartDT DATETIME, @EndDT DATETIME

    --select HotelId

    -- ,RoomTypeId

    -- ,StartDate

    -- ,EndDate

    -- from (values (1,1,'2012-01-01', '2012-03-01'),

    -- (2,1,'2012-01-01', '2012-02-01'),

    -- (3,1,'2012-01-01', '2012-02-01'),

    -- (3,1,'2012-03-10', '2012-03-20')) DesiredResults (HotelId, RoomTypeId, StartDate, EndDate)

    SET NOCOUNT ON

    PRINT '------ bteraberry''s query'

    SET STATISTICS TIME ON

    select@HotelID=dt.HotelId

    ,@RoomTypeID=dt.RoomTypeId

    ,@StartDT = dateadd(dd, min(dt.N), '20000101')

    ,@EndDT = dateadd(dd, max(dt.N), '20000101')

    from

    (

    selectdistinct

    rd.HotelId

    ,rd.RoomTypeId

    ,n.N

    ,grp = n.N - dense_rank() over(order by rd.HotelId, rd.RoomTypeId, n.N)

    from #RoomDates as rd

    join #Nums as n

    on n.N between datediff(dd, '20000101', rd.StartDate) and datediff(dd, '20000101', rd.EndDate)

    ) as dt

    group by dt.HotelId, dt.RoomTypeId, dt.grp

    --order by dt.HotelId, dt.RoomTypeId, min(dt.N);

    SET STATISTICS TIME OFF

    PRINT '------ Mark''s query'

    SET STATISTICS TIME ON

    ;WITH StartsAndEnds(StartEnd,StartDate,EndDate,HotelId,RoomTypeId) AS (

    SELECT 'S' AS StartEnd,

    StartDate,

    DATEADD(day,-1,StartDate),

    HotelId,

    RoomTypeId

    FROM #RoomDates

    UNION ALL

    SELECT 'E' AS StartEnd,

    DATEADD(day,1,EndDate),

    EndDate,

    HotelId,

    RoomTypeId

    FROM #RoomDates),

    OrderedStarts AS (

    SELECT StartDate,

    HotelId,

    RoomTypeId,

    ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY StartDate,StartEnd DESC) AS rnBoth,

    2*(ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId,StartEnd ORDER BY StartDate))-1 AS rnStartEnd

    FROM StartsAndEnds),

    OrderedEnds AS (

    SELECT EndDate,

    HotelId,

    RoomTypeId,

    ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY EndDate DESC,StartEnd) AS rnBothRev,

    2*(ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId,StartEnd ORDER BY EndDate DESC))-1 AS rnStartEndRev

    FROM StartsAndEnds),

    Starts AS (

    SELECT StartDate,

    HotelId,

    RoomTypeId,

    ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY StartDate) AS rn

    FROM OrderedStarts

    WHERE rnBoth=rnStartEnd),

    Ends AS (

    SELECT EndDate,

    HotelId,

    RoomTypeId,

    ROW_NUMBER() OVER(PARTITION BY HotelId,RoomTypeId ORDER BY EndDate) AS rn

    FROM OrderedEnds

    WHERE rnBothRev=rnStartEndRev)

    SELECT @HotelID=s.HotelId,@RoomTypeID=s.RoomTypeId,@SD=s.StartDate,@ED=e.EndDate

    FROM Starts s

    INNER JOIN Ends e ON e.HotelId=s.HotelId AND e.RoomTypeId=s.RoomTypeId AND e.rn=s.rn AND s.StartDate<=e.EndDate

    --ORDER BY s.HotelId,s.RoomTypeId,s.StartDate,e.EndDate;

    SET STATISTICS TIME OFF

    PRINT '------ Dwain''s query'

    SET STATISTICS TIME ON

    ;WITH Tally (n) AS (

    SELECT TOP (SELECT 1+MAX(DATEDIFF(day, StartDate, EndDate)) FROM #RoomDates)

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1

    FROM sys.all_columns),

    GroupDates AS (

    SELECT HotelID, RoomTypeID

    ,[Date]

    ,DateGroup = DATEADD(day

    , -ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY [Date]), [Date])

    FROM #RoomDates

    CROSS APPLY (

    SELECT [Date]=DATEADD(day, n, StartDate)

    FROM Tally

    WHERE DATEADD(day, n, StartDate) BETWEEN StartDate AND EndDate

    ) a

    GROUP BY HotelID, RoomTypeID, [Date])

    SELECT @HotelID=HotelID, @RoomTypeID=RoomTypeID

    ,@SD=MIN([Date])

    ,@ED=MAX([Date])

    FROM GroupDates

    GROUP BY HotelID, RoomTypeID, DateGroup

    SET STATISTICS TIME OFF

    PRINT '------ IBG''s query (built by Dwain.C for this case)'

    SET STATISTICS TIME ON

    ;WITH C1 AS (

    SELECT HotelID, RoomTypeID, ts=StartDate, Type=1, e=NULL

    ,s=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY StartDate)

    FROM #RoomDates

    UNION ALL

    SELECT HotelID, RoomTypeID, ts=EndDate, Type=-1

    ,e=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY EndDate)

    ,s=NULL

    FROM #RoomDates),

    C2 AS (

    SELECT C1.*

    ,se=ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY ts, Type DESC)

    FROM C1),

    C3 AS (

    SELECT HotelID, RoomTypeID, ts

    ,grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY HotelID, RoomTypeID ORDER BY ts)-1) / 2 + 1)

    FROM C2

    WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0)

    SELECT @HotelID=HotelID, @RoomTypeID=RoomTypeID, @SD=MIN(ts), @ED=MAX(ts)

    FROM C3

    GROUP BY HotelID, RoomTypeID, grpnm

    SET STATISTICS TIME OFF

    PRINT '------ ChrisM''s query'

    SET STATISTICS TIME ON

    SELECT @HotelID=HotelId, @RoomTypeID=RoomTypeId, @SD = MIN(startdate), @ED = MAX(enddate)

    FROM (

    SELECT

    HotelId, RoomTypeId, startdate, enddate,

    Grouper = DATEADD(day,

    0-DENSE_RANK() OVER (PARTITION BY HotelId, RoomTypeId ORDER BY InDate),

    InDate)

    FROM #RoomDates

    CROSS APPLY (

    SELECT TOP(1+DATEDIFF(DAY,startdate,enddate))

    InDate = DATEADD(day,

    (ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1),

    startdate)

    from sys.columns

    ) x

    ) c

    GROUP BY HotelId, RoomTypeId, Grouper

    SET STATISTICS TIME OFF

    PRINT '------ bterraberry''s query (2)'

    SET STATISTICS TIME ON

    ;with cteTemp as

    (

    selectHotelId

    ,RoomTypeId

    ,BeginDate = case when row_number() over(order by HotelId, RoomTypeId, theDate) - openCnt = 0 then theDate end

    ,EndDate = case when row_number() over(order by HotelId, RoomTypeId, theDate) - closeCnt = 0 then theDate end

    from

    (

    selectHotelId

    ,RoomTypeId

    ,theDate = StartDate

    ,closeCnt = null

    ,openCnt = (row_number() over(order by HotelId, RoomTypeId, StartDate) *2) - 1

    from #RoomDates

    union all

    selectHotelId

    ,RoomTypeId

    ,theDate = EndDate

    ,closeCnt = row_number() over(order by HotelId, RoomTypeId, EndDate) * 2

    ,openCnt = null

    from #RoomDates

    ) as dt

    )

    select@HotelID=dt.HotelId

    ,@RoomTypeID=dt.RoomTypeId

    ,@SD = min(dt.BeginDate)

    ,@ED = min(dt.EndDate)

    from

    (

    selectHotelId

    ,RoomTypeId

    ,BeginDate

    ,EndDate

    ,grpID = case when BeginDate is not null then row_number() over(order by HotelId, RoomTypeId, BeginDate)

    else row_number() over(order by HotelId, RoomTypeId, EndDate) end

    from cteTemp

    where BeginDate is not null or EndDate is not null

    ) as dt

    group by dt.HotelId, dt.RoomTypeId, dt.grpID

    order by dt.HotelId, dt.RoomTypeId, dt.grpID;

    SET STATISTICS TIME OFF

    if object_id('tempdb.dbo.#RoomDates', 'U') is not null

    drop table #RoomDates

    if object_id('tempdb.dbo.#Nums', 'U') is not null

    drop table #Nums

    Shows that I must have done something right because this solution seems to win out:

    ------ bteraberry's query

    SQL Server Execution Times:

    CPU time = 15834 ms, elapsed time = 10126 ms.

    ------ Mark's query

    SQL Server Execution Times:

    CPU time = 7051 ms, elapsed time = 4544 ms.

    ------ Dwain's query

    SQL Server Execution Times:

    CPU time = 17207 ms, elapsed time = 22379 ms.

    ------ IBG's query (built by Dwain.C for this case)

    SQL Server Execution Times:

    CPU time = 718 ms, elapsed time = 756 ms.

    ------ ChrisM's query

    SQL Server Execution Times:

    CPU time = 11169 ms, elapsed time = 16211 ms.

    ------ bterraberry's query (2)

    SQL Server Execution Times:

    CPU time = 999 ms, elapsed time = 1019 ms.

    Bteraberry's second solution comes a close second and it appears he's using elements of the IBG solution.

    Edit: Note that IBG also suggested some indexing to improve his solution's speed, which I did not employ here.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St