Find "Consecutive" records based on available dates in another table

  • I have an activity that happens each Mon, Wed, and Friday, except holidays. I have a table that stores all of the appropriate dates. Since this only happens M-W-F, I consider 05-04-2016, 05-06-2016, and 05-09-2016 "consecutive".

    The appropriate dates are stored in a table just for this purpose. Like this

    ADate Holiday Description

    07-01-2016 0 NULL

    07-04-2016 1 July 4th

    07-06-2016 0 NULL

    I have another table that tracks employee participation. It has a field for date, and a field for the employee ID.

    ADate EmployeeID

    07-01-2016 1234

    07-01-2016 5678

    07-06-2016 1234

    07-08-2016 1234

    At the end of a two-month period, I want to determine which employee participated in the most "consecutive" opportunities. In the above case, employee 5678 = 1 and employee 1234 = 3.

  • If you create a table of dates for the MWF dates, then you can use ROWNUMBER() to assign a sequential value to each date... then you can find a gap whenever the date in the current row and LAG() to look at the previous row are greater than 1.

  • You need to have a RowNumber/Rank in your date table that assigns each date with a sequence no.. Often it can be achieved with an identity column, but then you will have to make sure that inserts into the date table are always sequential (and consecutive in your case)..

    Now you can join the date table & employee table on date to link the sequence no column in date table with employees table and then do a self join on the new employee set to get only those entries where difference in sequence no = 1

    something like:

    select count (e1) +1

    from employee e1

    INNER JOIN employee e2

    ON e1.sequence_no - e2.sequence_no = 1

    group by e1

    By the way, why do you have dates in both tables? does'nt seem like a good design..

    "The price of anything is the amount of life you exchange for it" - Henry David Thoreau
  • Mohit Dhiman (7/8/2016)


    You need to have a RowNumber/Rank in your date table that assigns each date with a sequence no.

    I would generate it on the fly instead of persisting it on the table. For one query, you might want to rank just Mondays, Wednesdays and Fridays; for another, maybe Saturdays and Sundays.

    Now you can join the date table & employee table on date to link the sequence no column in date table with employees table and then do a self join on the new employee set to get only those entries where difference in sequence no = 1

    LEAD and LAG, available in SQL Server 2012 and later, are usually more efficient than self-joins.

    By the way, why do you have dates in both tables? does'nt seem like a good design..

    Seems OK to me, so long as they don't include time portions, and there's a composite primary key on ADate and EmployeeID.

    John

  • Of course. It all seems obvious once someone spells it out for me.

    I like the idea of doing it on the fly. There's no point in having this information just lying about. It will help with clerical errors, aka, not entering dates in order.

  • jtrask (7/13/2016)


    Of course. It all seems obvious once someone spells it out for me.

    I like the idea of doing it on the fly. There's no point in having this information just lying about. It will help with clerical errors, aka, not entering dates in order.

    Here's another way, which is based on date range algorithms so is well-known:

    --The appropriate dates are stored in a table just for this purpose. Like this

    IF OBJECT_ID('tempdb..#Holidays') IS NOT NULL DROP TABLE #Holidays;

    ;WITH Holidays (ADate, Holiday, Description) AS (

    SELECT '07-01-2016', 0, NULL UNION ALL

    SELECT '07-04-2016', 1, 'July 4th' UNION ALL

    SELECT '07-06-2016', 0, NULL UNION ALL

    SELECT '07-08-2016', 0, NULL UNION ALL

    SELECT '07-11-2016', 0, NULL UNION ALL

    SELECT '07-13-2016', 0, NULL UNION ALL

    SELECT '07-15-2016', 0, NULL UNION ALL

    SELECT '07-18-2016', 0, NULL UNION ALL

    SELECT '07-20-2016', 0, NULL UNION ALL

    SELECT '07-22-2016', 0, NULL UNION ALL

    SELECT '07-25-2016', 0, NULL UNION ALL

    SELECT '07-27-2016', 0, NULL UNION ALL

    SELECT '07-29-2016', 0, NULL

    ) SELECT * INTO #Holidays FROM Holidays

    --I have another table that tracks employee participation. It has a field for date, and a field for the employee ID.

    IF OBJECT_ID('tempdb..#employee_participation') IS NOT NULL DROP TABLE #employee_participation;

    ;WITH employee_participation (ADate, EmployeeID) AS (

    SELECT '07-01-2016', 5678 UNION ALL

    SELECT '07-01-2016', 1234 UNION ALL

    SELECT '07-06-2016', 1234 UNION ALL

    SELECT '07-08-2016', 1234 UNION ALL

    SELECT '07-04-2016', 3333 UNION ALL

    SELECT '07-06-2016', 3333 UNION ALL

    SELECT '07-08-2016', 3333 UNION ALL

    SELECT '07-13-2016', 3333 UNION ALL

    SELECT '07-18-2016', 3333 UNION ALL

    SELECT '05-06-2016', 4444 UNION ALL

    SELECT '07-06-2016', 4444 UNION ALL

    SELECT '07-11-2016', 4444 UNION ALL

    SELECT '07-13-2016', 4444 UNION ALL

    SELECT '07-15-2016', 4444 UNION ALL

    SELECT '07-18-2016', 4444 UNION ALL

    SELECT '07-18-2016', 4444 UNION ALL

    SELECT '08-18-2016', 4444

    ) SELECT * INTO #employee_participation FROM employee_participation

    -- Solution

    ;WITH

    Emps AS (SELECT *, rn = ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY ADate) FROM #employee_participation),

    TheDays AS (SELECT *, rn = ROW_NUMBER() OVER (ORDER BY ADate) FROM #Holidays h)

    SELECT EmployeeID, RangeStart, RangeEnd, DaysContiguous

    FROM (

    SELECT e.EmployeeID, RangeStart = MIN(e.ADate), RangeEnd = MAX(e.ADate), DaysContiguous = COUNT(*),

    rn = ROW_NUMBER() OVER (PARTITION BY e.EmployeeID ORDER BY COUNT(*) DESC)

    FROM Emps e

    LEFT JOIN TheDays d ON d.ADate = e.ADate

    GROUP BY e.EmployeeID, d.rn-e.rn

    ) d

    WHERE rn = 1

    -- Run this bit to see how the solution works

    ;WITH

    Emps AS (SELECT *, rn = ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY ADate) FROM #employee_participation),

    TheDays AS (SELECT *, rn = ROW_NUMBER() OVER (ORDER BY ADate) FROM #Holidays h)

    SELECT e.EmployeeID, e.ADate, d.rn, e.rn, d.rn-e.rn

    FROM Emps e

    LEFT JOIN TheDays d ON d.ADate = e.ADate

    ORDER BY e.EmployeeID, e.ADate

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply