Find employee with overlapping shifts using cte recursive

  • I have a table of shifts that an employee has worked:

    emp_id: varchar(7)

    start (datetime)

    end (datetime)

    I want to find all employees that have two shifts within the same time span

    ie for employee X, the start time for any shift cannot be between the start and end time

    for any other shift for that employee.

    However, I would like to do this using a common table expression ideally using recursion

    Any ideas?

  • Hello and welcome to ssc. Please read the link in my sig (click on the word "this"), which will show you how to create sample data for folks to code against.

    Your requirement sounds straightforward, using a CTE makes sense but probably not a recursive CTE - they're expensive to run and are not recommended unless necessary.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • DDL and sample data with desired results certainly makes it easier to provide tested code, but I think something like this would work:

    SELECT

    *

    FROM

    shifts AS S1

    JOIN shifts AS S2

    ON S1.emp_id = S2.emp_id AND

    S1.START BETWEEN S2.START AND S2.[END];

    You don't really need a CTE at all. The only issue with what I have provided is that the same row will match itself because you didn't provide anything information about a primary key or unique constraint that would keep that from happening.

    I hope you are looking for this so you can clean up the data and put in some business logic that wouldn't allow this to happen in the first place.

  • This may also work:

    DECLARE @Shifts TABLE

    (emp_id varchar(7), start datetime, [end] datetime)

    INSERT INTO @Shifts

    SELECT 'Dwain', '2012-07-08 08:00', '2012-07-08 17:00'

    UNION ALL SELECT 'Dwain', '2012-07-09 08:00', '2012-07-09 17:00'

    UNION ALL SELECT 'Dwain', '2012-07-09 16:00', '2012-07-09 23:00'

    UNION ALL SELECT 'Sheikh', '2012-07-08 08:00', '2012-07-08 17:00'

    UNION ALL SELECT 'Sheikh', '2012-07-09 08:00', '2012-07-09 17:00'

    UNION ALL SELECT 'Sheikh', '2012-07-09 16:00', '2012-07-09 23:00'

    SELECT emp_id, s1.start

    FROM @Shifts s1

    CROSS APPLY (

    SELECT start, [end]

    FROM @Shifts s2

    WHERE s1.emp_id = s2.emp_id AND

    s1.start <> s2.start AND s1.[end] <> s2.[end]) a

    WHERE s1.start BETWEEN a.start AND a.[end]

    Performance-wise, it's difficult to say whether Jack's or mine is better. Would need to test with lots of test data and you didn't even provide a sample.

    Also, providing expected results along with some sample data, might modify the actual query that ends up working for you.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Curiosity got the best of me, so I decided to build a 1.2M row test harness to compare the 2:

    DECLARE @Shifts TABLE

    (emp_id varchar(7), start datetime, [end] datetime)

    DECLARE @emp_id1 varchar(7), @start1 datetime, @end1 datetime

    ,@emp_id2 varchar(7), @start2 datetime, @end2 datetime

    ;WITH Tally (n) AS (

    SELECT TOP 300000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns c1 CROSS JOIN sys.all_columns c2)

    INSERT INTO @Shifts

    SELECT emp_id, DATEADD(day, n-1, start), DATEADD(day, n-1, [end])

    FROM (

    SELECT 'Dwain', '2012-07-09 08:00', '2012-07-09 17:00'

    UNION ALL SELECT 'Dwain', '2012-07-09 16:00', '2012-07-09 23:00'

    UNION ALL SELECT 'Sheikh', '2012-07-09 08:00', '2012-07-09 17:00'

    UNION ALL SELECT 'Sheikh', '2012-07-09 02:00', '2012-07-09 09:00') a(emp_id, start, [end])

    CROSS APPLY Tally

    PRINT '---- Dwain''s Query --'

    SET STATISTICS TIME ON

    SELECT @emp_id1=emp_id, @start1=s1.start

    FROM @Shifts s1

    CROSS APPLY (

    SELECT start, [end]

    FROM @Shifts s2

    WHERE s1.emp_id = s2.emp_id AND

    s1.start <> s2.start AND s1.[end] <> s2.[end]) a

    WHERE s1.start BETWEEN a.start AND a.[end]

    SET STATISTICS TIME OFF

    PRINT '---- Jack''s Query --'

    SET STATISTICS TIME ON

    SELECT

    @emp_id1=s1.emp_id, @start1=s1.start, @end1=s1.[end]

    ,@emp_id2=s2.emp_id, @start2=s2.start, @end2=s2.[end]

    FROM

    @Shifts AS S1

    JOIN @Shifts AS S2

    ON S1.emp_id = S2.emp_id AND

    S1.START BETWEEN S2.START AND S2.[END];

    SET STATISTICS TIME OFF

    The results were:

    (1200000 row(s) affected)

    ---- Dwain's Query --

    SQL Server Execution Times:

    CPU time = 10858 ms, elapsed time = 11585 ms.

    ---- Jack's Query --

    SQL Server Execution Times:

    CPU time = 10623 ms, elapsed time = 10864 ms.

    Notes:

    1. Jack's query returns a different row set than mine, so I sloughed off the results into a local variable so as not skew the elapsed times.

    2. Jack's seems to have a slight edge over mine.

    3. Different indexing schemes may impact these results.

    Usually my advice is to avoid self-JOINs on large tables if you can, but in this case that may not be the best advice.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Dwain,

    Essentially your query does a self-join just using a different syntax, so I'm not sure why you don't like Self-Join's. In a case like this you have to do a self join and if the table is indexed correctly it should perform as well as a join to any other table. When you look at the execution plans both queries access the shifts table twice and join with a Nested Loops (Inner Join). Your query has an Extra Filter operator from the derived table in the CROSS APPLY, which is the only difference in execution plans. Both queries have the same reads so I'd bet that they would scale similarly.

    Interestingly if you change the Table Variable to a temp table (at least on my laptop) your query returns quickly and mine is still running and is at 30+ minutes. I'm going to let it run to see how long it takes as long as it isn't affecting my ability to work.

  • Jack Corbett (7/12/2012)


    Dwain,

    Essentially your query does a self-join just using a different syntax, so I'm not sure why you don't like Self-Join's. In a case like this you have to do a self join and if the table is indexed correctly it should perform as well as a join to any other table. When you look at the execution plans both queries access the shifts table twice and join with a Nested Loops (Inner Join). Your query has an Extra Filter operator from the derived table in the CROSS APPLY, which is the only difference in execution plans. Both queries have the same reads so I'd bet that they would scale similarly.

    You are correct sir. I didn't think about it at the time as I only had a few minutes to spare on preparing the test harness, so didn't look at the query plans. I guess my prior experience has jaded me against self-JOINs, so any other solution (even a disguised self-JOIN) looks better to me than the self-JOIN.

    Jack Corbett (7/12/2012)


    Interestingly if you change the Table Variable to a temp table (at least on my laptop) your query returns quickly and mine is still running and is at 30+ minutes. I'm going to let it run to see how long it takes as long as it isn't affecting my ability to work.

    That is interesting. I've seen differences in performance tests before when run against table variables vs. temp tables. I had forgotten about it though.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I never actually had it finish. Went for an hour and was maxing my CPU. My query needs to keep it from joining to itself so if you add:

    s1.start <> s2.start AND s1.[end] <> s2.[end]

    To my query you get the exact same execution plan as Dwain's query and the same performance. The key is having a key and making sure that they are not equal.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply