SQL to split row by date (split into multiple rows)

  • I am looking for help with splitting a row into multiple rows based on dates overlapping.

    As an example, I have a table with the following data:

    Row ID, Employee, Job, Start Date, End Date, Workload

    1, John Doe, HSBC, 01/01/2013, 31/12/2013, 100

    2, John Doe, Vacation, 17/06/2013, 21/06/2013, 100

    3, John Doe, Vacation, 19/08/2013, 23/08/2013, 100

    4, John Doe, Barclays, 01/01/2014, 31/01/2014, 50

    5, John Doe, Santander, 06/01/2014, 25/01/2014, 50

    6, John Doe, Vacation, 13/01/2014, 17/01/2014, 100

    I am looking to split the banking rows where they overlap with a vacation. So for example, the final result should be:

    Row ID, Employee, Job, Start Date, End Date, Workload

    1, John Doe, HSBC, 01/01/2013, 16/06/2013, 100

    2, John Doe, Vacation, 17/06/2013, 21/06/2013, 100

    3, John Doe, HSBC, 22/06/2013, 18/08/2013, 100

    4, John Doe, Vacation, 19/08/2013, 23/08/2013, 100

    5, John Doe, HSBC, 24/08/2013, 31/12/2013, 100

    6, John Doe, Barclays, 01/01/2014, 12/01/2014, 50

    7, John Doe, Barclays, 18/01/2014, 31/01/2014, 50

    8, John Doe, Santander, 06/01/2014, 12/01/2014, 50

    9, John Doe, Santander, 18/01/2014, 25/01/2014, 50

    10, John Doe, Vacation, 13/01/2014, 17/01/2014, 100

    New rows after split are in bold. The Row ID should be unique although it doesn't need to be sequential.

    Any help or guidance would be appreciated.

  • hi,

    I tried to understand your example but I didn't.

    I have no idea how you get to the date values for the added rows. And what about the first row in your example, the date values have changed there too.

    Please try to explain it again, maybe more detailled.

    By the way, please read the following article and provide your example data a little bit more comfortable πŸ˜‰

    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • I understand your requirements. This is a bit challenging. Since you are brand new around here I went ahead and put together some ddl and sample data in a consumable format so you can see how you should post this type of thing.

    Here it is:

    set dateformat dmy

    if OBJECT_ID('tempdb..#Something') is not null

    drop table #Something

    create table #Something

    (

    ID int,

    Employee varchar(20),

    Job varchar(20),

    StartDate datetime,

    EndDate datetime,

    Workload int

    )

    insert #Something

    select *

    from (Values

    (1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100)

    ,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100)

    ,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100)

    ,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50)

    ,(5, 'John Doe', 'Santander', '06/01/2014', '25/01/2014', 50)

    ,(6, 'John Doe', 'Vacation', '13/01/2014', '17/01/2014', 100)

    ) x(a,b,c,d,e,f)

    select *

    from #Something

    So now we all have something to work with. I will see if I can come up with something that will work. Meantime others can work on this too now that they have details to work with.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Hi WolfgangE and Sean Lange,

    Thank you for your responses.

    WolfgangE, sorry should have been more clear in my requirements.

    We have a system where an employee can be booked on a job for long periods, so in my crude example, John Doe is booked to HSBC for all of 2013. However, in between, employees can be on vacation too so John Doe has 2 vacations during the time he is booked on HSBC.

    What we are looking to achieve is to split the job either side of vacation bookings so it all slots in as opposed to being stacked. I attach a screenshot as an example.

    Sean Lange, thanks for the DDL. I should have provided that in the first place.

    I look forward to your guidance and assistance.

  • Tricky.

    Could you clarify your first sample output from above? With HSBC you intersperse the vacation with the new rows of HSBC. With the other banks you show the vacation AFTER the banking records.

    I am pretty close to a solution (its not very pretty), but assuming the output order matters I need to understand the sorting rules.

  • thank goodness someone else tries, I dont' find a solution yet...

  • Here is what I have. Sort order is not the same as the original post suggested, so we could still use a clairification on those rules.

    There is likely a cleaner way to do this, but this is my first pass. Note that it can get tripped up if a job and a vacation start on the same date, so if that is an issue let me know.

    set dateformat dmy

    if OBJECT_ID('tempdb..#Something') is not null

    drop table #Something

    create table #Something

    (

    ID int,

    Employee varchar(20),

    Job varchar(20),

    StartDate datetime,

    EndDate datetime,

    Workload int

    )

    insert #Something

    select *

    from (Values

    (1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100)

    ,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100)

    ,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100)

    ,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50)

    ,(5, 'John Doe', 'Santander', '06/01/2014', '25/01/2014', 50)

    ,(6, 'John Doe', 'Vacation', '13/01/2014', '17/01/2014', 100)

    ) x(a,b,c,d,e,f)

    ;

    WITH VacationCTE AS (--This CTE grabs just the Vacation rows so we can compare and split dates from them

    SELECT ID,

    Employee,

    Job,

    StartDate,

    EndDate,

    Workload

    FROM #Something

    WHERE Job = 'Vacation'

    ),

    NewRowsCTE AS ( --This CTE creates just new rows starting after the vacations for each banking job

    SELECT a.ID,

    a.Employee,

    a.Job,

    DATEADD (d,1,b.EndDate) AS StartDate,

    a.EndDate,

    a.Workload

    FROM #Something a

    INNER JOIN VacationCTE b

    ON a.StartDate <= b.StartDate

    AND a.EndDate > b.StartDate

    AND a.EndDate > b.EndDate -- This is needed because if the vacation ends when the project does, there is no split row after

    ),

    UnionCTE AS ( -- This CTE merges the new rows with the existing ones

    SELECT ID,

    Employee,

    Job,

    StartDate,

    EndDate,

    Workload

    FROM #Something

    UNION ALL

    SELECT ID,

    Employee,

    Job,

    StartDate,

    EndDate,

    Workload

    FROM NewRowsCTE

    ),

    FixEndDateCTE as (

    SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate) AS FixID, min(d.StartDate) as StartDate

    FROM UnionCTE c

    LEFT OUTER JOIN VacationCTE d

    ON c.StartDate < d.StartDate

    AND c.EndDate >= d.StartDate

    WHERE c.Job <> 'Vacation'

    GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate)

    )

    SELECT row_number() OVER (ORDER BY e.Startdate),

    e.Employee,

    e.Job,

    e.StartDate,

    CASE WHEN f.StartDate IS NULL

    THEN e.EndDate

    ELSE DATEADD (d,-1,f.StartDate)

    END,

    e.Workload

    FROM UnionCTE e

    LEFT OUTER JOIN FixEndDateCTE f

    ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.StartDate)) = f.FixID

    ORDER BY e.StartDate

    It is tricky because you need to separate vacations from other rows ( treated differently), create new rows post vacation, and update both the existing and new rows with corrected end dates. Probably far easier to do within a stored procedure where you can use a temp table and take multiple steps, but I tried to do it within a query.

  • Hi SSC Veteran,

    Thank you very much for your response.

    Your DDL appears to be working. I am testing further.

    The IDs for existing rows can remain the same with new IDs allocated to new rows (after split). No particular order is required.

    I really appreciate your suggestion.

  • Hi All,

    Thanks to SSC Veteran's assistance, I am close to the correct result set.

    During testing, I found that when 2 rows overlap, duplicate rows appear. An example is:

    John Smith, Vacation, 12/02/2014, 25/02/2014

    John Smith, Natwest, 14/02/2014, 28/02/2014

    The expected result should be the Natwest job start date is the next day after vacation:

    John Smith, Vacation, 12/02/2014, 25/02/2014

    John Smith, Natwest, 26/02/2014, 28/02/2014

    However, I see the following:

    John Smith, Vacation, 12/02/2014, 25/02/2014

    John Smith, Natwest, 14/02/2014, 28/02/2014

    John Smith, Natwest, 26/02/2014, 28/02/2014

    Any pointers would be appreciated. Here is the DDL.

    set dateformat dmy

    if OBJECT_ID('tempdb..#Something') is not null

    drop table #Something

    create table #Something

    (

    ID int,

    Employee varchar(20),

    Job varchar(20),

    StartDate datetime,

    EndDate datetime,

    Workload int

    )

    insert #Something

    select *

    from (Values

    (1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100)

    ,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100)

    ,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100)

    ,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50)

    ,(5, 'John Doe', 'Santander', '06/01/2014', '25/01/2014', 50)

    ,(6, 'John Doe', 'Vacation', '13/01/2014', '17/01/2014', 100)

    ,(7, 'John Doe', 'Natwest', '14/02/2014', '28/02/2014', 100)

    ,(8, 'John Doe', 'Vacation', '12/02/2014', '25/02/2014', 100)

    ) x(a,b,c,d,e,f)

    ;

    WITH VacationCTE AS (--This CTE grabs just the Vacation rows so we can compare and split dates from them

    SELECT ID,

    Employee,

    Job,

    StartDate,

    EndDate,

    Workload

    FROM #Something

    WHERE Job = 'Vacation'

    ),

    NewRowsCTE AS ( --This CTE creates just new rows starting after the vacations for each banking job

    SELECT a.ID,

    a.Employee,

    a.Job,

    DATEADD (d,1,b.EndDate) AS StartDate,

    a.EndDate,

    a.Workload

    FROM #Something a

    INNER JOIN VacationCTE b

    ON a.StartDate <= b.EndDate

    AND a.EndDate > b.StartDate

    AND a.EndDate > b.EndDate -- This is needed because if the vacation ends when the project does, there is no split row after

    ),

    UnionCTE AS ( -- This CTE merges the new rows with the existing ones

    SELECT ID,

    Employee,

    Job,

    StartDate,

    EndDate,

    Workload

    FROM #Something

    UNION ALL

    SELECT ID,

    Employee,

    Job,

    StartDate,

    EndDate,

    Workload

    FROM NewRowsCTE

    ),

    FixEndDateCTE as (

    SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate) AS FixID, min(d.StartDate) as StartDate

    FROM UnionCTE c

    LEFT OUTER JOIN VacationCTE d

    ON c.StartDate < d.StartDate

    AND c.EndDate >= d.StartDate

    WHERE c.Job <> 'Vacation'

    GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate)

    )

    SELECT row_number() OVER (ORDER BY e.Startdate),

    e.Employee,

    e.Job,

    e.StartDate,

    CASE WHEN f.StartDate IS NULL

    THEN e.EndDate

    ELSE DATEADD (d,-1,f.StartDate)

    END,

    e.Workload

    FROM UnionCTE e

    LEFT OUTER JOIN FixEndDateCTE f

    ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.StartDate)) = f.FixID

    ORDER BY e.StartDate

  • This appears to work:

    SELECT

    ID = ROW_NUMBER() OVER(ORDER BY MIN(DateRange)),

    Employee, Job, [Workload],

    StartDate = MIN(DateRange),

    EndDate = MAX(DateRange)

    FROM (

    SELECT ID, Employee, Job, [Workload], DateRange,

    Grouper = DENSE_RANK() over (order by Employee, daterange)

    - DENSE_RANK() over (order by ID, Employee, daterange)

    FROM (

    SELECT

    so.ID,

    so.Employee,

    Job = so.Job,

    so.[Workload],

    c.DateRange,

    rn = DENSE_RANK() OVER(PARTITION BY c.DateRange ORDER BY CASE WHEN so.Job = 'Vacation' THEN 0 ELSE 1 END)

    FROM #Something so

    CROSS APPLY (SELECT RangeStart = MIN(StartDate), RangeEnd = MAX(EndDate) FROM #Something si) r

    CROSS APPLY dbo.IF_Calendar (r.RangeStart,r.RangeEnd,'monday') c

    WHERE c.DateRange BETWEEN so.StartDate AND so.EndDate

    ) d

    WHERE rn = 1

    ) q

    GROUP BY ID, Employee, Job, [Workload], Grouper

    Here's the function definition:

    CREATE FUNCTION [dbo].[IF_Calendar]

    (

    @StartDate DATE,

    @EndDate DATE,

    @FirstWeekDay VARCHAR(10)

    )

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    (

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b), --10E+2 or 100 rows

    E3(N) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b CROSS JOIN E1 c), --1M rows max

    iTally AS ( -- generate sufficient rows to cover startdate to enddate inclusive

    SELECT TOP(1+DATEDIFF(DAY,@StartDate,@EndDate))

    rn = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1

    FROM E3

    )

    -- Do some date arithmetic

    SELECT

    a.DateRange,

    c.[Year],

    c.[Month],

    c.[DayOfMonth],

    c.AbsWeekno,

    c.[DayName],

    d.Holiday

    FROM iTally

    CROSS APPLY (SELECT DateRange = DATEADD(day,rn,@StartDate)) a

    CROSS APPLY (VALUES ('Tuesday',1),('Wednesday',2),('Thursday',3),('Friday',4),('Saturday',5),('Sunday',6),('Monday',7)

    ) b (FirstWeekDay, FirstWeekdayOffset)

    CROSS APPLY (

    SELECT

    [Year] = YEAR(a.DateRange),

    [Month] = MONTH(a.DateRange),

    [DayOfMonth] = DAY(a.DateRange),

    AbsWeekno= DATEDIFF(day,FirstWeekdayOffset,a.DateRange)/7,

    [DayName]= DATENAME(weekday,a.DateRange)

    ) c

    CROSS APPLY (

    SELECT Holiday = CASE

    WHEN [Month] = 1 AND [DayOfMonth] = 1 THEN 'New Year'

    WHEN [Month] = 5 AND [DayOfMonth] >= 25 AND [DayName] = 'Monday' THEN 'Memorial Day'

    WHEN [Month] = 7 AND [DayOfMonth] = 4 THEN 'Independence Day'

    WHEN [Month] = 9 AND [DayOfMonth] <= 7 AND [DayName] = 'Monday' THEN 'Labor Day'

    WHEN [Month] = 11 AND [DayOfMonth] BETWEEN 22 AND 28 AND [DayName] = 'Thursday' THEN 'Thanksgiving Day'

    WHEN [Month] = 12 AND [DayOfMonth] = 25 THEN 'Christmas Day'

    ELSE NULL END

    ) d

    WHERE b.FirstWeekDay = @FirstWeekDay

    AND @EndDate IS NOT NULL

    )

    GO

    Which you could easily replace with an inline tally table.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi Chris,

    Thanks for your response. I will test this further.

    How would I include time along with a date in the function as I completely forgot to mentioned the StartDate and EndDate are datetime fields?

  • A slightly related question. If I have the table as per the DDL below, how do I remove rows where the status is 'temp' for overlapping rows by date? In the example below, row with ID 5 will be removed as it overlaps with ID 4 as they are for the same person and job.

    if OBJECT_ID('tempdb..#Something') is not null

    drop table #Something

    GO

    set dateformat dmy

    Go

    create table #Something

    (

    ID int,

    Employee varchar(20),

    Job varchar(20),

    StartDate datetime,

    EndDate datetime,

    [Workload] int,

    [Status] varchar(30)

    )

    insert #Something

    select *

    from (Values

    (1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100, 'temp')

    ,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100, 'confirmed')

    ,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100, 'temp')

    ,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50, 'confirmed')

    ,(5, 'John Doe', 'Barclays', '06/01/2014', '31/01/2014', 50, 'temp')

    ,(6, 'John Doe', 'Vacation', '13/01/2014', '25/01/2014', 100, 'confirmed')

    ) x(a,b,c,d,e,f,g)

    select *

    from #Something

  • Just wanted to add that the queries from SSC Veteran and ChrisM@Work are the solutions I went with. Thank you.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply