Find overlapping date ranges for more than 2 ranges

  • Hi! Long time reader, but I think this is my first time posting.

    I have a hypothetical scenario that I am struggling with and I'd really appreciate some ideas of what to try next.

    Imagine I have a number of people assigned to a project for a set period of time. Those people get invited to meetings lasting a set period of time.

    I want to work out the date ranges when all the people on a project are invited to meetings at the same time.

    This is modelled like this - along with some sample data (that is unrealistic but tests the scenario):

    create table #EmployeeProject

    (

    EmployeeId int

    ,ProjectId int

    ,ProjectStart datetime

    ,ProjectEnd datetime

    )

    create table #EmployeeMeeting

    (

    EmployeeId int

    ,ProjectId int

    ,MeetingStart datetime

    ,MeetingEnd datetime

    )

    insert into #EmployeeProject values (1, 1, '2014-01-01 09:00', '2014-04-30 17:00')

    insert into #EmployeeProject values (2, 1, '2014-05-01 09:00', '2014-12-31 17:00')

    insert into #EmployeeProject values (3, 1, '2014-05-01 09:00', '2014-12-31 17:00')

    insert into #EmployeeProject values (4, 1, '2014-05-01 09:00', '2014-12-31 17:00')

    insert into #EmployeeMeeting values (1, 1, '2014-02-01 09:00', '2014-03-31 17:00')

    insert into #EmployeeMeeting values (2, 1, '2014-06-01 09:00', '2014-10-31 17:00')

    insert into #EmployeeMeeting values (3, 1, '2014-07-01 09:00', '2014-09-30 17:00')

    insert into #EmployeeMeeting values (4, 1, '2014-08-01 09:00', '2014-08-31 17:00')

    What I'd like to get as a result set is:

    ProjectId, RangeStart, RangeEnd

    1, 2014-02-01 09:00, 2014-03-31 17:00

    1, 2014-08-01 09:00, 2014-08-31 17:00

    There feels like there should be a relatively simple way to do this, but I can't come up with a working solution that isn't either restricted by the number of overlaps it can deal with, or that uses a time of day" dimension table that has a row for each minute of each day. Using my real-world data, rather than the example above, this does not scale well!

    Please help!

  • I'm not sure I really understand your requirement, nor do I see exactly how your input relates to your desired output. Nice job of providing DDL and sample data by the way.

    But your title suggests to me that something here might be of use to you.

    Calculating Gaps Between Overlapping Time Intervals in SQL[/url]

    Note that the really heavy lifting is the code I used from Itzik Ben-Gan in that article. Unfortunately his original article (linked to in mine) has disappeared and along with it his excellent explanation.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • This works with your sample data

    WITH StartsAndEnds(StartEnd,ProjectId) AS (

    -- Start points which overlap the same number of meetings as projects

    SELECT s.MeetingStart,s.ProjectId

    FROM #EmployeeMeeting s

    LEFT OUTER JOIN #EmployeeMeeting m ON m.ProjectId = s.ProjectId

    AND s.MeetingStart BETWEEN m.MeetingStart AND m.MeetingEnd

    LEFT OUTER JOIN #EmployeeProject p ON p.ProjectId = s.ProjectId

    AND s.MeetingStart BETWEEN p.ProjectStart AND p.ProjectEnd

    GROUP BY s.MeetingStart,s.ProjectId

    HAVING COUNT(DISTINCT p.EmployeeId) = COUNT(DISTINCT m.EmployeeId)

    UNION ALL

    -- End points which overlap the same number of meetings as projects

    SELECT s.MeetingEnd,s.ProjectId

    FROM #EmployeeMeeting s

    LEFT OUTER JOIN #EmployeeMeeting m ON m.ProjectId = s.ProjectId

    AND s.MeetingEnd BETWEEN m.MeetingStart AND m.MeetingEnd

    LEFT OUTER JOIN #EmployeeProject p ON p.ProjectId = s.ProjectId

    AND s.MeetingEnd BETWEEN p.ProjectStart AND p.ProjectEnd

    GROUP BY s.MeetingEnd,s.ProjectId

    HAVING COUNT(DISTINCT p.EmployeeId) = COUNT(DISTINCT m.EmployeeId)

    ),

    OrderedStartsAndEnds AS (

    SELECT StartEnd,ProjectId,

    ROW_NUMBER() OVER(PARTITION BY ProjectId ORDER BY StartEnd) AS rn

    FROM StartsAndEnds

    )

    SELECT s.ProjectId,

    s.StartEnd AS RangeStart,

    e.StartEnd AS RangeEnd

    FROM OrderedStartsAndEnds s

    INNER JOIN OrderedStartsAndEnds e ON e.ProjectId = s.ProjectId

    AND e.rn = s.rn + 1

    WHERE EXISTS(SELECT * FROM #EmployeeProject p WHERE p.ProjectId = s.ProjectId

    AND s.StartEnd BETWEEN p.ProjectStart AND p.ProjectEnd

    AND e.StartEnd BETWEEN p.ProjectStart AND p.ProjectEnd);

    Simpler version

    WITH StartsAndEnds(ProjectId,StartEnd,rn) AS (

    SELECT t.ProjectId,

    ca.StartEnd,

    ROW_NUMBER() OVER(PARTITION BY t.ProjectId ORDER BY ca.StartEnd)

    FROM #EmployeeMeeting t

    CROSS APPLY(VALUES(t.MeetingStart),(t.MeetingEnd)) AS ca(StartEnd)

    LEFT OUTER JOIN #EmployeeMeeting m ON m.ProjectId = t.ProjectId

    AND ca.StartEnd BETWEEN m.MeetingStart AND m.MeetingEnd

    LEFT OUTER JOIN #EmployeeProject p ON p.ProjectId = t.ProjectId

    AND ca.StartEnd BETWEEN p.ProjectStart AND p.ProjectEnd

    GROUP BY ca.StartEnd,t.ProjectId

    HAVING COUNT(DISTINCT p.EmployeeId) = COUNT(DISTINCT m.EmployeeId)

    )

    SELECT s.ProjectId,

    s.StartEnd AS RangeStart,

    e.StartEnd AS RangeEnd

    FROM StartsAndEnds s

    INNER JOIN StartsAndEnds e ON e.ProjectId = s.ProjectId

    AND e.rn = s.rn + 1

    WHERE EXISTS(SELECT * FROM #EmployeeProject p WHERE p.ProjectId = s.ProjectId

    AND s.StartEnd BETWEEN p.ProjectStart AND p.ProjectEnd

    AND e.StartEnd BETWEEN p.ProjectStart AND p.ProjectEnd);

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Many thanks for your responses - I've got some things to try now!

    dwain.c, I probably didn't explain it too well, and my attempt to anonymise/simplify what I'm really trying to do into something that made sense probably failed!

    I found Itzik Ben-Gan's "Intervals and Counts" articles that are linked at the bottom of your link after I posted this yesterday, and along with your article, they have helped me along a bit.

    Using that technique, I'm now trying to create a dataset that has the following columns:

    ProjectId, Timestamp, MeetingCounter, EmployeeCounter

    With a row for each time something happens for a project, i.e. a meeting starts/ends, or an employee joins/leaves the project.

    I then want the rows where MeetingCounter = EmployeeCounter to give me the Start Timestamp, and the next row to give me the End Timestamp.

    When I've cracked it, I'll post a follow-up in case it helps anyone else.

    Mark Cowne, Thanks also for your reply.

    Unfortunately, I can't use the "simpler" version as I am using SQL 2005 because of VALUES in the CROSS APPLY (which also prevents me using LEAD in what I describe above).

    As you point out, your solution works with my example data, but I think it may have problems with my real-world data, where employees may join/leave different meetings multiple times. I will try and create some better example data to illustrate/test.

  • coalesceuk (3/10/2015)


    Many thanks for your responses - I've got some things to try now!

    dwain.c, I probably didn't explain it too well, and my attempt to anonymise/simplify what I'm really trying to do into something that made sense probably failed!

    I found Itzik Ben-Gan's "Intervals and Counts" articles that are linked at the bottom of your link after I posted this yesterday, and along with your article, they have helped me along a bit.

    Using that technique, I'm now trying to create a dataset that has the following columns:

    ProjectId, Timestamp, MeetingCounter, EmployeeCounter

    With a row for each time something happens for a project, i.e. a meeting starts/ends, or an employee joins/leaves the project.

    I then want the rows where MeetingCounter = EmployeeCounter to give me the Start Timestamp, and the next row to give me the End Timestamp.

    When I've cracked it, I'll post a follow-up in case it helps anyone else.

    Mark Cowne, Thanks also for your reply.

    Unfortunately, I can't use the "simpler" version as I am using SQL 2005 because of VALUES in the CROSS APPLY (which also prevents me using LEAD in what I describe above).

    As you point out, your solution works with my example data, but I think it may have problems with my real-world data, where employees may join/leave different meetings multiple times. I will try and create some better example data to illustrate/test.

    If you're reading up on IBG's contributions that's quite likely to put you on the right track. He's the master in this domain, with the caveat that he does employ a lot of SQL 2008/2012 stuff in his solutions.

    Definitely post back what you end up with. Amazing how some will take that as bait and try to improve on it.

    My apologies for not having more time yesterday and today to study what you need carefully and offer a direct solution.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I think I have a solution, and by changing my example data to split one meeting into 2, I can break Mark Cowne's suggestion

    IF OBJECT_ID('tempdb.dbo.#EmployeeProject') IS NOT NULL DROP TABLE #EmployeeProject

    create table #EmployeeProject

    (

    EmployeeId int

    ,ProjectId int

    ,ProjectStart datetime

    ,ProjectEnd datetime

    )

    IF OBJECT_ID('tempdb.dbo.#EmployeeMeeting') IS NOT NULL DROP TABLE #EmployeeMeeting

    create table #EmployeeMeeting

    (

    EmployeeId int

    ,ProjectId int

    ,MeetingStart datetime

    ,MeetingEnd datetime

    )

    insert into #EmployeeProject values (1, 1, '2014-01-01 09:00', '2014-04-30 17:00')

    insert into #EmployeeProject values (2, 1, '2014-05-01 09:00', '2014-12-31 17:00')

    insert into #EmployeeProject values (3, 1, '2014-05-01 09:00', '2014-12-31 17:00')

    insert into #EmployeeProject values (4, 1, '2014-05-01 09:00', '2014-12-31 17:00')

    insert into #EmployeeMeeting values (1, 1, '2014-02-01 09:00', '2014-03-31 17:00')

    insert into #EmployeeMeeting values (2, 1, '2014-06-01 09:00', '2014-10-31 17:00')

    insert into #EmployeeMeeting values (3, 1, '2014-07-01 09:00', '2014-09-30 17:00')

    --insert into #EmployeeMeeting values (4, 1, '2014-08-01 09:00', '2014-08-31 17:00')

    insert into #EmployeeMeeting values (4, 1, '2014-08-01 09:00', '2014-08-21 17:00')

    insert into #EmployeeMeeting values (4, 1, '2014-08-21 17:00', '2014-08-31 17:00')

    ;with c1 as --Get all the events in a row

    (

    select --Start Date of Employees on Projects

    p.ProjectId

    ,p.ProjectStart as t

    ,1 as ProjectIncrement

    ,0 as MeetingIncrement

    from #EmployeeProject p

    union all

    select --End Date of Employees on Projects

    p.ProjectId

    ,p.ProjectEnd

    ,-1

    ,0

    from #EmployeeProject p

    union all

    select --Start Date of Employees in Meetings

    m.ProjectId

    ,m.MeetingStart

    ,0

    ,1

    from #EmployeeMeeting m

    union all

    select --End Date of Employees in Meetings

    m.ProjectId

    ,m.MeetingEnd

    ,0

    ,-1

    from #EmployeeMeeting m

    )

    ,c2 as --Add a row number column for easier joining

    (

    select *

    ,row_number() over (partition by ProjectId ORDER BY t, ProjectIncrement, MeetingIncrement) as rn

    from c1

    )

    ,c3 as --Calculate running totals of Projects and Meetings

    (

    select c2_now.*

    ,sum(c2_prev.ProjectIncrement) as ProjectCount

    ,sum(c2_prev.MeetingIncrement) as MeetingCount

    from c2 c2_now

    left join c2 c2_prev

    on c2_now.ProjectId = c2_prev.ProjectId

    and c2_now.rn >= c2_prev.rn

    group by

    c2_now.ProjectId

    ,c2_now.t

    ,c2_now.ProjectIncrement

    ,c2_now.MeetingIncrement

    ,c2_now.rn

    )

    select

    c3_now.ProjectId

    ,c3_now.t as RangeStart

    ,c3_next.t as RangeEnd

    ,c3_now.ProjectCount

    ,c3_now.MeetingCount

    from c3 c3_now

    left join c3 c3_next

    on c3_now.ProjectId = c3_next.ProjectId

    and c3_now.rn = c3_next.rn - 1

    where c3_now.ProjectCount = c3_now.MeetingCount

    and c3_now.ProjectCount > 0

    order by c3_now.rn

    If I had SQL 2012, this could be simplified greatly with SUM(ProjectCount) OVER(ProjectId)

    I haven't stress-tested this to see how it performs, but it is quicker than the code I wrote that uses a time-of-day dimension.

    Any suggestions for improvements would be very welcome. Thanks again for the pointers - I was stuck before I posted here!

  • You can probably save yourself a couple of scans by changing CTE c1 as follows:

    with c1 as --Get all the events in a row

    (

    select --Start Date of Employees on Projects

    p.ProjectId

    ,t

    ,ProjectIncrement

    ,MeetingIncrement

    from #EmployeeProject p

    CROSS APPLY

    (

    SELECT p.ProjectStart, 1, 0

    UNION ALL

    SELECT p.ProjectEnd, -1, 0

    ) x (t, ProjectIncrement, MeetingIncrement)

    union all

    select --Start Date of Employees in Meetings

    m.ProjectId

    ,t

    ,ProjectIncrement

    ,MeetingIncrement

    from #EmployeeMeeting m

    CROSS APPLY

    (

    SELECT m.MeetingStart, 0, 1

    UNION ALL

    SELECT m.MeetingEnd, 0, -1

    ) x (t, ProjectIncrement, MeetingIncrement)

    )

    Note that this is not tested but I believe CROSS APPLY was introduced in SQL 2005.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • coalesceuk (3/10/2015)


    I think I have a solution, and by changing my example data to split one meeting into 2, I can break Mark Cowne's suggestion

    IF OBJECT_ID('tempdb.dbo.#EmployeeProject') IS NOT NULL DROP TABLE #EmployeeProject

    create table #EmployeeProject

    (

    EmployeeId int

    ,ProjectId int

    ,ProjectStart datetime

    ,ProjectEnd datetime

    )

    IF OBJECT_ID('tempdb.dbo.#EmployeeMeeting') IS NOT NULL DROP TABLE #EmployeeMeeting

    create table #EmployeeMeeting

    (

    EmployeeId int

    ,ProjectId int

    ,MeetingStart datetime

    ,MeetingEnd datetime

    )

    insert into #EmployeeProject values (1, 1, '2014-01-01 09:00', '2014-04-30 17:00')

    insert into #EmployeeProject values (2, 1, '2014-05-01 09:00', '2014-12-31 17:00')

    insert into #EmployeeProject values (3, 1, '2014-05-01 09:00', '2014-12-31 17:00')

    insert into #EmployeeProject values (4, 1, '2014-05-01 09:00', '2014-12-31 17:00')

    insert into #EmployeeMeeting values (1, 1, '2014-02-01 09:00', '2014-03-31 17:00')

    insert into #EmployeeMeeting values (2, 1, '2014-06-01 09:00', '2014-10-31 17:00')

    insert into #EmployeeMeeting values (3, 1, '2014-07-01 09:00', '2014-09-30 17:00')

    --insert into #EmployeeMeeting values (4, 1, '2014-08-01 09:00', '2014-08-31 17:00')

    insert into #EmployeeMeeting values (4, 1, '2014-08-01 09:00', '2014-08-21 17:00')

    insert into #EmployeeMeeting values (4, 1, '2014-08-21 17:00', '2014-08-31 17:00')

    ;with c1 as --Get all the events in a row

    (

    select --Start Date of Employees on Projects

    p.ProjectId

    ,p.ProjectStart as t

    ,1 as ProjectIncrement

    ,0 as MeetingIncrement

    from #EmployeeProject p

    union all

    select --End Date of Employees on Projects

    p.ProjectId

    ,p.ProjectEnd

    ,-1

    ,0

    from #EmployeeProject p

    union all

    select --Start Date of Employees in Meetings

    m.ProjectId

    ,m.MeetingStart

    ,0

    ,1

    from #EmployeeMeeting m

    union all

    select --End Date of Employees in Meetings

    m.ProjectId

    ,m.MeetingEnd

    ,0

    ,-1

    from #EmployeeMeeting m

    )

    ,c2 as --Add a row number column for easier joining

    (

    select *

    ,row_number() over (partition by ProjectId ORDER BY t, ProjectIncrement, MeetingIncrement) as rn

    from c1

    )

    ,c3 as --Calculate running totals of Projects and Meetings

    (

    select c2_now.*

    ,sum(c2_prev.ProjectIncrement) as ProjectCount

    ,sum(c2_prev.MeetingIncrement) as MeetingCount

    from c2 c2_now

    left join c2 c2_prev

    on c2_now.ProjectId = c2_prev.ProjectId

    and c2_now.rn >= c2_prev.rn

    group by

    c2_now.ProjectId

    ,c2_now.t

    ,c2_now.ProjectIncrement

    ,c2_now.MeetingIncrement

    ,c2_now.rn

    )

    select

    c3_now.ProjectId

    ,c3_now.t as RangeStart

    ,c3_next.t as RangeEnd

    ,c3_now.ProjectCount

    ,c3_now.MeetingCount

    from c3 c3_now

    left join c3 c3_next

    on c3_now.ProjectId = c3_next.ProjectId

    and c3_now.rn = c3_next.rn - 1

    where c3_now.ProjectCount = c3_now.MeetingCount

    and c3_now.ProjectCount > 0

    order by c3_now.rn

    If I had SQL 2012, this could be simplified greatly with SUM(ProjectCount) OVER(ProjectId)

    I haven't stress-tested this to see how it performs, but it is quicker than the code I wrote that uses a time-of-day dimension.

    Any suggestions for improvements would be very welcome. Thanks again for the pointers - I was stuck before I posted here!

    Mine breaks due to a zero length interval which can be fixed by simply adding

    AND s.StartEnd < e.StartEnd

    to the end of the query.

    Just for fun, here's a SQL Server 2012 version that runs quicker

    WITH StartsAndEnds(ProjectId,RangeStart,RangeEnd) AS (

    SELECT t.ProjectId,

    ca.StartEnd,

    LEAD(ca.StartEnd) OVER(PARTITION BY t.ProjectId ORDER BY ca.StartEnd)

    FROM #EmployeeMeeting t

    CROSS APPLY(VALUES(t.MeetingStart),(t.MeetingEnd)) AS ca(StartEnd)

    INNER JOIN #EmployeeMeeting m ON m.ProjectId = t.ProjectId

    AND ca.StartEnd BETWEEN m.MeetingStart AND m.MeetingEnd

    INNER JOIN #EmployeeProject p ON p.ProjectId = t.ProjectId

    AND ca.StartEnd BETWEEN p.ProjectStart AND p.ProjectEnd

    GROUP BY ca.StartEnd,t.ProjectId

    HAVING COUNT(DISTINCT p.EmployeeId) = COUNT(DISTINCT m.EmployeeId)

    )

    SELECT s.ProjectId,

    s.RangeStart,

    s.RangeEnd

    FROM StartsAndEnds s

    WHERE EXISTS(SELECT * FROM #EmployeeProject p WHERE p.ProjectId = s.ProjectId

    AND s.RangeStart BETWEEN p.ProjectStart AND p.ProjectEnd

    AND s.RangeEnd BETWEEN p.ProjectStart AND p.ProjectEnd);

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • dwain.c, the cross apply does remove a lot of the table scans and works in 2005. I find it harder to understand/read what a cross apply is doing just by looking at the query, but in this case, I follow what it does and why it is better.

    Mark, I have some other "real" data that I've tried with your 2005 friendly solution.

    insert into #EmployeeProject values (1, 1, '2014-09-26 13:40:08.000', '2015-01-30 20:35:36.000')

    insert into #EmployeeProject values (2, 1, '2014-10-02 16:13:01.000', '2015-02-04 18:52:58.000')

    insert into #EmployeeMeeting values (1, 1, '2014-09-29 15:56:41.000', '2014-09-29 17:08:20.000')

    insert into #EmployeeMeeting values (1, 1, '2014-12-03 09:51:54.000', '2015-01-30 15:43:30.000')

    insert into #EmployeeMeeting values (2, 1, '2014-10-02 21:25:28.000', '2014-10-07 09:21:09.000')

    insert into #EmployeeMeeting values (2, 1, '2014-11-14 12:09:50.000', '2014-12-03 09:45:33.000')

    insert into #EmployeeMeeting values (2, 1, '2014-12-05 09:12:21.000', '2015-01-30 15:34:43.000')

    I am expecting the following results:

    ProjectIdRangeStartRangeEnd

    12014-09-29 15:56:41.0002014-09-29 17:08:20.000

    12014-12-05 09:12:21.0002015-01-30 15:34:43.000

    However, I am getting an extra row returned (that isn't a result of a zero interval). What is going wrong in this case?

    Thanks!

  • coalesceuk (3/11/2015)


    dwain.c, the cross apply does remove a lot of the table scans and works in 2005. I find it harder to understand/read what a cross apply is doing just by looking at the query, but in this case, I follow what it does and why it is better.

    Check my signature links about CROSS APPLY VALUES to UNPIVOT. Using the SELECT/UNION ALL SELECT like I did is the SQL 2005 version of VALUES (table row constructor) shown in that article.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • coalesceuk (3/11/2015)


    dwain.c, the cross apply does remove a lot of the table scans and works in 2005. I find it harder to understand/read what a cross apply is doing just by looking at the query, but in this case, I follow what it does and why it is better.

    Mark, I have some other "real" data that I've tried with your 2005 friendly solution.

    insert into #EmployeeProject values (1, 1, '2014-09-26 13:40:08.000', '2015-01-30 20:35:36.000')

    insert into #EmployeeProject values (2, 1, '2014-10-02 16:13:01.000', '2015-02-04 18:52:58.000')

    insert into #EmployeeMeeting values (1, 1, '2014-09-29 15:56:41.000', '2014-09-29 17:08:20.000')

    insert into #EmployeeMeeting values (1, 1, '2014-12-03 09:51:54.000', '2015-01-30 15:43:30.000')

    insert into #EmployeeMeeting values (2, 1, '2014-10-02 21:25:28.000', '2014-10-07 09:21:09.000')

    insert into #EmployeeMeeting values (2, 1, '2014-11-14 12:09:50.000', '2014-12-03 09:45:33.000')

    insert into #EmployeeMeeting values (2, 1, '2014-12-05 09:12:21.000', '2015-01-30 15:34:43.000')

    I am expecting the following results:

    ProjectIdRangeStartRangeEnd

    12014-09-29 15:56:41.0002014-09-29 17:08:20.000

    12014-12-05 09:12:21.0002015-01-30 15:34:43.000

    However, I am getting an extra row returned (that isn't a result of a zero interval). What is going wrong in this case?

    Thanks!

    My bad, needs this at the end of the query

    AND EXISTS(SELECT * FROM #EmployeeMeeting m WHERE m.ProjectId = s.ProjectId

    AND s.StartEnd BETWEEN m.MeetingStart AND m.MeetingEnd

    AND e.StartEnd BETWEEN m.MeetingStart AND m.MeetingEnd)

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • dwain.c Thanks - I will have a read!

    Mark Thanks for the fix, which works with my example data.

    I've been testing both solutions with my real-world data. Both take about the same time to execute based on about 200000 Projects and 3000 Meetings.

    However, I've found another scenario in that data which breaks Mark's solution (which is probably because I've incompletely explained the scenario in my attempt to "simplify" it!).

    insert into #EmployeeProject values (1, 3, '2011-05-06 12:17:38.000', '2012-05-21 15:06:44.000')

    insert into #EmployeeProject values (2, 3, '2012-05-16 15:55:45.000', '2012-05-16 16:30:25.000')

    insert into #EmployeeMeeting values (1, 3, '2011-05-10 13:59:56.000', '2012-05-16 17:44:12.000'

    In this case I would be expecting the period to be returned where the employee is in a meeting on all the projects they are working on at the time, i.e.

    ProjectIdRangeStartRangeEnd

    32011-05-10 13:59:56.0002012-05-16 15:55:45.000

    32012-05-16 16:30:25.0002012-05-16 17:44:12.000

    What I'm getting with that solution is a missing part in the middle where the employee is allocated to the second project, but not in a meeting for it, i.e.

    ProjectIdRangeStartRangeEnd

    32011-05-10 13:59:56.0002012-05-16 17:44:12.000

    I'm guessing this is because we are purely looking at Start and End points of meetings, and missing what might happen in the middle?

  • coalesceuk (3/12/2015)


    dwain.c Thanks - I will have a read!

    Mark Thanks for the fix, which works with my example data.

    I've been testing both solutions with my real-world data. Both take about the same time to execute based on about 200000 Projects and 3000 Meetings.

    However, I've found another scenario in that data which breaks Mark's solution (which is probably because I've incompletely explained the scenario in my attempt to "simplify" it!).

    insert into #EmployeeProject values (1, 3, '2011-05-06 12:17:38.000', '2012-05-21 15:06:44.000')

    insert into #EmployeeProject values (2, 3, '2012-05-16 15:55:45.000', '2012-05-16 16:30:25.000')

    insert into #EmployeeMeeting values (1, 3, '2011-05-10 13:59:56.000', '2012-05-16 17:44:12.000'

    In this case I would be expecting the period to be returned where the employee is in a meeting on all the projects they are working on at the time, i.e.

    ProjectIdRangeStartRangeEnd

    32011-05-10 13:59:56.0002012-05-16 15:55:45.000

    32012-05-16 16:30:25.0002012-05-16 17:44:12.000

    What I'm getting with that solution is a missing part in the middle where the employee is allocated to the second project, but not in a meeting for it, i.e.

    ProjectIdRangeStartRangeEnd

    32011-05-10 13:59:56.0002012-05-16 17:44:12.000

    I'm guessing this is because we are purely looking at Start and End points of meetings, and missing what might happen in the middle?

    I suspect this means I've misunderstood the requirements, would probably need a re-write.

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply