Overlapping Date Ranges with Gaps Possible

  • Here is a head scratcher...

    Back Story:

    Operations has come and informed me that they now allow employees to log in and work in two different systems simultaneously. The problem this causes is that our payroll system needs to calculate hourly wages, but we need to be able to identify any overlapping shifts because the employee does not receive a wage for each system they are logged into. They also do not need to be logged into to both systems, they can also log in and out of either multiple times throughout a give day.

    Here is some sample data of my problem and the results I need in the end. I've been plugging away at this but can't seem to generate the results I need. Any help at all is greatly appreciated.

    /* Example Data */

    CREATE TABLE #ShiftData

    (

    emp_id INT,

    shift_id INT IDENTITY(1,1),

    shift_type VARCHAR(20),

    start_time DATETIME,

    end_time DATETIME,

    msgs_handled INT

    )

    INSERT INTO #ShiftData

    (emp_id, shift_type, start_time, end_time, msgs_handled)

    VALUES

    (111,'A',CAST('2010-04-08 12:00:00' AS DATETIME), CAST('2010-04-08 14:00:00' AS DATETIME), 45)

    INSERT INTO #ShiftData

    (emp_id,shift_type,start_time,end_time,msgs_handled)

    VALUES

    (111,'B',CAST('2010-04-08 13:00:00' AS DATETIME), CAST('2010-04-08 14:00:00' AS DATETIME), 20)

    INSERT INTO #ShiftData

    (emp_id,shift_type,start_time,end_time,msgs_handled)

    VALUES

    (111,'A',CAST('2010-04-08 15:00:00' AS DATETIME), CAST('2010-04-08 16:00:00' AS DATETIME), 25)

    INSERT INTO #ShiftData

    (emp_id,shift_type,start_time,end_time,msgs_handled)

    VALUES

    (111,'B',CAST('2010-04-08 15:00:00' AS DATETIME), CAST('2010-04-08 18:00:00' AS DATETIME), 100)

    INSERT INTO #ShiftData

    (emp_id,shift_type,start_time,end_time,msgs_handled)

    VALUES

    (111,'A',CAST('2010-04-08 16:30:00' AS DATETIME), CAST('2010-04-08 17:30:00' AS DATETIME), 30)

    INSERT INTO #ShiftData

    (emp_id,shift_type,start_time,end_time,msgs_handled)

    VALUES

    (222,'B',CAST('2010-04-08 15:00:00' AS DATETIME), CAST('2010-04-08 18:00:00' AS DATETIME), 101)

    INSERT INTO #ShiftData

    (emp_id,shift_type,start_time,end_time,msgs_handled)

    VALUES

    (222,'A',CAST('2010-04-08 17:00:00' AS DATETIME), CAST('2010-04-08 19:00:00' AS DATETIME), 104)

    INSERT INTO #ShiftData

    (emp_id,shift_type,start_time,end_time,msgs_handled)

    VALUES

    (333,'A',CAST('2010-04-08 17:00:00' AS DATETIME), CAST('2010-04-08 19:00:00' AS DATETIME), 50)

    Desired Results:

    emp_idhrs_worked

    ------------------

    1115

    2224

    3332

  • I would use a calendar table (holding the day values), expand it to cover 24hrs per day (or the possible working hours) and join that to the ShiftData table.

    Something like this:

    ;WITH cte AS

    (

    SELECT DATEADD(hh,number,DATE) datehr

    FROM calendar

    CROSS APPLY

    (

    SELECT TOP 24 number FROM master..spt_values

    WHERE TYPE ='P'

    ) hr

    )

    SELECT

    emp_id,

    COUNT(DISTINCT datehr) AS hrs_worked

    FROM cte

    INNER JOIN #ShiftData

    ON datehr>=start_time

    AND datehr<end_time

    GROUP BY emp_id

    Side note: you need to define how you'll handle working periods that do not begin or end at the hour and will be less than one hour. With the code as above a period from 13:00 to 14:59 will be considered as one hr the same like working from 13:59 to 14:01. Please clarify.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Thanks for that. I actually came across a post on an Oracle forum that detailed what you mention above. I got up, took a walk, wished I could down a beer or two and returned to my desk and came up with this:

    SELECT emp_id, CAST(SUM(DATEDIFF(ss, start_time, end_time)/3600.00) AS DECIMAL(5,2))

    FROM

    (

    SELECT s1.emp_id, MIN(s1.start_time) AS start_time, MAX(s1.end_time) AS end_time

    FROM

    #ShiftData s1 LEFT OUTER JOIN #ShiftData t1

    ON s1.start_time <= t1.end_time

    AND NOT EXISTS

    (

    SELECT * FROM #ShiftData t2

    WHERE t1.end_time >= t2.start_time AND t1.end_time < t2.end_time

    )

    WHERE NOT EXISTS

    (

    SELECT * FROM #ShiftData s2

    WHERE s1.start_time > s2.start_time AND s1.start_time <= s2.end_time

    )

    GROUP BY s1.start_time, s1.emp_id

    ) AS T

    GROUP BY emp_id

    It gives me the crucial part of my desired results. I just need to connect it with the shift data for employees without overlapping or multiple shifts (employee 333 in my example data), but that should be easy.

    I am not sure how efficient my code is yet, but at least it is not a cursor like so much of the legacy code I have seen in our systems. I have a to run it through some more testing to ensure it captures all the different scenarios that could occur.

  • Your reply doesn't really answer the question on how to actually calculate the working hours.

    Your sample data do make room for some interpretation... Also, your desired result is an integer value (which would require some rounding routines), whereas the second method calculates the working hours with rounding to two decimal places based on the seconds worked. Again, please clarify.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Sorry for any confusion. What I am inquiring about is a subset of a larger process and my example data is simplified. I need to calculate hours worked to the second, but then round to a precision of two decimal places when converting to hours.

    CAST(SUM(DATEDIFF(ss, start_time, end_time)/3600.00) AS DECIMAL(5,2))

    My query above actually failed to accurately calculate the hours worked for emp_id 222, it was off by one hour. Just got back from a meeting and hope to dig in and find out why.

    My expected results should be:

    emp_id hrs_worked

    ------------------

    111 5.00

    222 4.00

    333 2.00

  • Here's another version.

    Side note: instead of expanding it to 24hrs per day I just used the hours between noon and 10pm. If you need all 24hrs, you'd need to change the hr subquery.

    I'm not sure how it's going to perform compared to the other solution you posted (once it returns the expected result).

    ;WITH cte AS

    (

    SELECT DATEADD(hh,number,DATE) datehr_from,

    DATEADD(hh,number+1,DATE) datehr_to

    FROM calendar

    CROSS APPLY

    (

    SELECT TOP 10 number FROM master..spt_values

    WHERE TYPE ='P'

    and number >11

    ) hr

    where mnth=4

    ),

    cte2 AS

    (

    SELECT

    MAX(

    CASE WHEN start_time>datehr_from AND start_time<datehr_to

    THEN DATEDIFF(ss,start_time,datehr_to)

    WHEN end_time>datehr_from AND end_time<datehr_to

    THEN DATEDIFF(ss,datehr_from,end_time)

    ELSE 3600 END

    ) AS dur,

    datehr_from,

    emp_id

    FROM cte

    INNER JOIN #ShiftData

    ON datehr_to>start_time

    AND datehr_from< end_time

    GROUP BY emp_id,datehr_from

    )

    SELECT emp_id,cast(SUM(dur)/3600.00 as decimal(5,2))

    FROM cte2

    GROUP BY emp_id



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Sorry did not read your last post (about using seconds) but this will compute total hours worked. I also edited / added some additional data to test for start / end time not ending on an hour, but as some minutes after the hour.

    Disregard this solution as pointed out by imu

    lmu92 (4/9/2010)

    Unfortunately, your solution does lead to pay twice for one hour.

    For example emp_id 222:

    The hour between 17:00 18:00 is included twice in the sample data (for shift A and for shift B) but only needs to be counted once. Therefore, the result should be 4.25, not 5.25.

    SELECT emp_id,shift_type,start_time,end_time FROM #ShiftData ORDER BY emp_id

    --==My input

    emp_idshift_typestart_timeend_time

    111A2010-04-08 12:00:00.0002010-04-08 14:00:00.000

    111B2010-04-08 13:00:00.0002010-04-08 14:00:00.000

    111A2010-04-08 15:00:00.0002010-04-08 16:00:00.000

    111B2010-04-08 15:00:00.0002010-04-08 18:00:00.000

    111A2010-04-08 16:30:00.0002010-04-08 17:30:00.000

    222B2010-04-08 15:00:00.0002010-04-08 18:00:00.000

    222A2010-04-08 17:00:00.0002010-04-08 19:00:00.000

    222A2010-04-08 19:05:00.0002010-04-08 19:20:00.000

    333A2010-04-08 17:00:00.0002010-04-08 19:00:00.000

    444A2010-04-08 17:00:00.0002010-04-08 17:59:00.000

    --===

    SELECT emp_id,CAST(SUM(({fn HOUR(end_time)}-{fn HOUR(start_time)})) AS DECIMAL(4,2)) +

    CAST(SUM({fn MINUTE( end_time )}- {fn MINUTE( start_time )})AS DECIMAL(4,2))/60 AS 'Tottal Hrs Worked'

    FROM #ShiftData GROUP BY emp_id

    /* results

    emp_idTottal Hrs Worked

    1118.000000

    2225.250000

    3332.000000

    4440.983333

    */

    Note that this will not handle starting and ending on consecutive days

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • @Ron:

    Unfortunately, your solution does lead to pay twice for one hour.

    For example emp_id 222:

    The hour between 17:00 18:00 is included twice in the sample data (for shift A and for shift B) but only needs to be counted once. Therefore, the result should be 4.25, not 5.25.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • bitbucket,

    Your solution is not correct as far as what I'm attempting to accomplish.

    Shift_Type A and B denote the two different systems an employee can be logged into (which can overlap, but is not always the case). I need to pay the total hours worked, but that needs to take into consideration these overlaps. When an overlap occurs I do not want to pay the agent the hours accrued in both systems, but simply the hours accrued as if it were one system.

    My example data is simplified and this is what the results should be for employee 111:

    5 hours worked

    Following the shift data I created in my original post for emp 111 this is how the logic should flow.

    agent logged into system A (shift_type A) at 12:00:00 and logged off at 14:00:00, however during that time he also logged into system B from 13:00:00 to 14:00:00, therefore he should be paid for two hours.

    Then there is a gap of 1 hour and the agent logs back into both system A and B at 15:00:00, logs off A at 16:00:00, but remains on B until 18:00:00, however logs back into system A at 16:30:00 and works until 17:30:00. Therefore, we must pay this agent for another 3 hours.

    Totaling 5 hours worked for employee 111 for the day.

  • lmu92 (4/9/2010)


    @Ron:

    Unfortunately, your solution does lead to pay twice for one hour.

    For example emp_id 222:

    The hour between 17:00 18:00 is included twice in the sample data (for shift A and for shift B) but only needs to be counted once. Therefore, the result should be 4.25, not 5.25.

    Whoops and I think that is data I fat fingered.

    So back to work for me .... Imu thanks for catching the error

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Assuming that records with later shift_ids for the same employee never start earlier that earlier records (if you see what I mean)...this is a running total problem, solvable using the quirky update:

    CREATE TABLE #ShiftData

    (

    emp_id INT,

    shift_id INT IDENTITY(1,1),

    shift_type VARCHAR(20),

    start_time DATETIME,

    end_time DATETIME,

    msgs_handled INT

    );

    INSERT #ShiftData

    (emp_id,shift_type,start_time,end_time,msgs_handled)

    VALUES (111,'A',CAST('2010-04-08 11:30:00' AS DATETIME), CAST('2010-04-08 12:30:00' AS DATETIME), 50)

    INSERT #ShiftData

    (emp_id, shift_type, start_time, end_time, msgs_handled)

    VALUES (111,'A',CAST('2010-04-08 12:00:00' AS DATETIME), CAST('2010-04-08 14:00:00' AS DATETIME), 45)

    INSERT #ShiftData

    (emp_id,shift_type,start_time,end_time,msgs_handled)

    VALUES (111,'B',CAST('2010-04-08 13:00:00' AS DATETIME), CAST('2010-04-08 14:00:00' AS DATETIME), 20)

    INSERT #ShiftData

    (emp_id,shift_type,start_time,end_time,msgs_handled)

    VALUES (111,'A',CAST('2010-04-08 15:00:00' AS DATETIME), CAST('2010-04-08 16:00:00' AS DATETIME), 25)

    INSERT #ShiftData

    (emp_id,shift_type,start_time,end_time,msgs_handled)

    VALUES (111,'B',CAST('2010-04-08 15:00:00' AS DATETIME), CAST('2010-04-08 18:00:00' AS DATETIME), 100)

    INSERT #ShiftData

    (emp_id,shift_type,start_time,end_time,msgs_handled)

    VALUES (111,'A',CAST('2010-04-08 16:30:00' AS DATETIME), CAST('2010-04-08 18:30:00' AS DATETIME), 30)

    INSERT #ShiftData

    (emp_id,shift_type,start_time,end_time,msgs_handled)

    VALUES (222,'B',CAST('2010-04-08 15:00:00' AS DATETIME), CAST('2010-04-08 18:00:00' AS DATETIME), 101)

    INSERT #ShiftData

    (emp_id,shift_type,start_time,end_time,msgs_handled)

    VALUES (222,'A',CAST('2010-04-08 17:00:00' AS DATETIME), CAST('2010-04-08 19:00:00' AS DATETIME), 104)

    INSERT #ShiftData

    (emp_id,shift_type,start_time,end_time,msgs_handled)

    VALUES (333,'A',CAST('2010-04-08 17:00:00' AS DATETIME), CAST('2010-04-08 19:00:00' AS DATETIME), 50)

    -- Add a running total column (or transfer the data to a new table with SELECT...INTO)

    ALTER TABLE #ShiftData ADD total_minutes INTEGER NOT NULL DEFAULT 0 WITH VALUES;

    -- Clustered index required for running total update

    CREATE UNIQUE CLUSTERED INDEX c ON #ShiftData (emp_id, shift_id);

    -- Variables used by running total update

    DECLARE @emp_id INTEGER,

    @current_end DATETIME,

    @total_minutes INTEGER;

    -- Initialise

    SET @current_end = '1900-01-01T00:00:00';

    SET @emp_id = -1;

    SET @total_minutes = 0;

    -- Running total update

    UPDATE #ShiftData WITH (TABLOCKX)

    SET @total_minutes =

    total_minutes =

    CASE

    -- New employee: running total of minutes = length of first shift

    WHEN @emp_id <> emp_id THEN DATEDIFF(MINUTE, start_time, end_time)

    ELSE

    -- Same employee

    CASE

    -- Add contribution from this shift

    -- New segment

    WHEN start_time > @current_end THEN @total_minutes + DATEDIFF(MINUTE, start_time, end_time)

    -- Extension of current segment

    WHEN end_time > @current_end THEN @total_minutes + DATEDIFF(MINUTE, @current_end, end_time)

    -- Ignore record completely covered already

    ELSE @total_minutes

    END

    END,

    -- Current end of shift

    @current_end =

    CASE

    -- New employee = new end of shift

    WHEN @emp_id <> emp_id THEN end_time

    -- Later end time

    WHEN end_time > @current_end THEN end_time

    -- Not later = ignore

    ELSE @current_end

    END,

    -- Remember the current employee

    @emp_id = emp_id

    OUTPUT inserted.*

    OPTION (MAXDOP 1);

    -- Final results

    SELECT emp_id,

    minutes_worked = MAX(total_minutes),

    hours_worked = CONVERT(DECIMAL(5,2), MAX(total_minutes) / 60.0)

    FROM #ShiftData

    GROUP BY

    emp_id

    ORDER BY

    emp_id;

    GO

    -- Tidy up

    DROP TABLE #ShiftData

  • Thank you, Paul and everyone for your help. I can ensure that the shift start times are sorted properly so this will work for my needs.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply