Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


SQL to split row by date (split into multiple rows)


SQL to split row by date (split into multiple rows)

Author
Message
SQLStalker
SQLStalker
Grasshopper
Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)

Group: General Forum Members
Points: 23 Visits: 78
I am looking for help with splitting a row into multiple rows based on dates overlapping.

As an example, I have a table with the following data:

Row ID, Employee, Job, Start Date, End Date, Workload
1, John Doe, HSBC, 01/01/2013, 31/12/2013, 100
2, John Doe, Vacation, 17/06/2013, 21/06/2013, 100
3, John Doe, Vacation, 19/08/2013, 23/08/2013, 100
4, John Doe, Barclays, 01/01/2014, 31/01/2014, 50
5, John Doe, Santander, 06/01/2014, 25/01/2014, 50
6, John Doe, Vacation, 13/01/2014, 17/01/2014, 100

I am looking to split the banking rows where they overlap with a vacation. So for example, the final result should be:

Row ID, Employee, Job, Start Date, End Date, Workload
1, John Doe, HSBC, 01/01/2013, 16/06/2013, 100
2, John Doe, Vacation, 17/06/2013, 21/06/2013, 100
3, John Doe, HSBC, 22/06/2013, 18/08/2013, 100
4, John Doe, Vacation, 19/08/2013, 23/08/2013, 100
5, John Doe, HSBC, 24/08/2013, 31/12/2013, 100
6, John Doe, Barclays, 01/01/2014, 12/01/2014, 50
7, John Doe, Barclays, 18/01/2014, 31/01/2014, 50
8, John Doe, Santander, 06/01/2014, 12/01/2014, 50
9, John Doe, Santander, 18/01/2014, 25/01/2014, 50
10, John Doe, Vacation, 13/01/2014, 17/01/2014, 100

New rows after split are in bold. The Row ID should be unique although it doesn't need to be sequential.

Any help or guidance would be appreciated.
WolfgangE
WolfgangE
SSC Veteran
SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)

Group: General Forum Members
Points: 216 Visits: 777
hi,

I tried to understand your example but I didn't.
I have no idea how you get to the date values for the added rows. And what about the first row in your example, the date values have changed there too.
Please try to explain it again, maybe more detailled.

By the way, please read the following article and provide your example data a little bit more comfortable ;-)
http://www.sqlservercentral.com/articles/Best+Practices/61537/
Sean Lange
Sean Lange
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16540 Visits: 16993
I understand your requirements. This is a bit challenging. Since you are brand new around here I went ahead and put together some ddl and sample data in a consumable format so you can see how you should post this type of thing.

Here it is:


set dateformat dmy

if OBJECT_ID('tempdb..#Something') is not null
   drop table #Something
   
create table #Something
(
   ID int,
   Employee varchar(20),
   Job varchar(20),
   StartDate datetime,
   EndDate datetime,
   Workload int
)

insert #Something
select *
from (Values
(1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100)
,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100)
,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100)
,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50)
,(5, 'John Doe', 'Santander', '06/01/2014', '25/01/2014', 50)
,(6, 'John Doe', 'Vacation', '13/01/2014', '17/01/2014', 100)
) x(a,b,c,d,e,f)

select *
from #Something



So now we all have something to work with. I will see if I can come up with something that will work. Meantime others can work on this too now that they have details to work with.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
SQLStalker
SQLStalker
Grasshopper
Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)

Group: General Forum Members
Points: 23 Visits: 78
Hi WolfgangE and Sean Lange,

Thank you for your responses.

WolfgangE, sorry should have been more clear in my requirements.

We have a system where an employee can be booked on a job for long periods, so in my crude example, John Doe is booked to HSBC for all of 2013. However, in between, employees can be on vacation too so John Doe has 2 vacations during the time he is booked on HSBC.

What we are looking to achieve is to split the job either side of vacation bookings so it all slots in as opposed to being stacked. I attach a screenshot as an example.

Sean Lange, thanks for the DDL. I should have provided that in the first place.

I look forward to your guidance and assistance.
Nevyn
Nevyn
SSC Eights!
SSC Eights! (960 reputation)SSC Eights! (960 reputation)SSC Eights! (960 reputation)SSC Eights! (960 reputation)SSC Eights! (960 reputation)SSC Eights! (960 reputation)SSC Eights! (960 reputation)SSC Eights! (960 reputation)

Group: General Forum Members
Points: 960 Visits: 3149
Tricky.

Could you clarify your first sample output from above? With HSBC you intersperse the vacation with the new rows of HSBC. With the other banks you show the vacation AFTER the banking records.

I am pretty close to a solution (its not very pretty), but assuming the output order matters I need to understand the sorting rules.
WolfgangE
WolfgangE
SSC Veteran
SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)SSC Veteran (216 reputation)

Group: General Forum Members
Points: 216 Visits: 777
thank goodness someone else tries, I dont' find a solution yet...
Nevyn
Nevyn
SSC Eights!
SSC Eights! (960 reputation)SSC Eights! (960 reputation)SSC Eights! (960 reputation)SSC Eights! (960 reputation)SSC Eights! (960 reputation)SSC Eights! (960 reputation)SSC Eights! (960 reputation)SSC Eights! (960 reputation)

Group: General Forum Members
Points: 960 Visits: 3149
Here is what I have. Sort order is not the same as the original post suggested, so we could still use a clairification on those rules.

There is likely a cleaner way to do this, but this is my first pass. Note that it can get tripped up if a job and a vacation start on the same date, so if that is an issue let me know.

set dateformat dmy

if OBJECT_ID('tempdb..#Something') is not null
   drop table #Something
   
create table #Something
(
   ID int,
   Employee varchar(20),
   Job varchar(20),
   StartDate datetime,
   EndDate datetime,
   Workload int
)

insert #Something
select *
from (Values
(1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100)
,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100)
,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100)
,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50)
,(5, 'John Doe', 'Santander', '06/01/2014', '25/01/2014', 50)
,(6, 'John Doe', 'Vacation', '13/01/2014', '17/01/2014', 100)
) x(a,b,c,d,e,f)
;
WITH VacationCTE AS (--This CTE grabs just the Vacation rows so we can compare and split dates from them
   SELECT ID,
         Employee,
         Job,
         StartDate,
         EndDate,
         Workload
   FROM #Something
   WHERE Job = 'Vacation'
   Wink,
   NewRowsCTE AS ( --This CTE creates just new rows starting after the vacations for each banking job
      SELECT a.ID,
         a.Employee,
         a.Job,
         DATEADD (d,1,b.EndDate) AS StartDate,
         a.EndDate,
         a.Workload
      FROM #Something a
      INNER JOIN VacationCTE b
         ON a.StartDate <= b.StartDate
         AND a.EndDate > b.StartDate
         AND a.EndDate > b.EndDate -- This is needed because if the vacation ends when the project does, there is no split row after
         Wink,
   UnionCTE AS ( -- This CTE merges the new rows with the existing ones
      SELECT ID,
            Employee,
            Job,
            StartDate,
            EndDate,
            Workload
      FROM #Something
      UNION ALL
      SELECT ID,
            Employee,
            Job,
            StartDate,
            EndDate,
            Workload
      FROM NewRowsCTE
      Wink,
   FixEndDateCTE as (
      SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate) AS FixID, min(d.StartDate) as StartDate
      FROM UnionCTE c
      LEFT OUTER JOIN VacationCTE d
         ON c.StartDate < d.StartDate
         AND c.EndDate >= d.StartDate
      WHERE c.Job <> 'Vacation'
      GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate)
      Wink
SELECT row_number() OVER (ORDER BY e.Startdate),
      e.Employee,
      e.Job,
      e.StartDate,
      CASE WHEN f.StartDate IS NULL
         THEN e.EndDate
         ELSE DATEADD (d,-1,f.StartDate)
         END,
      e.Workload
FROM UnionCTE e
LEFT OUTER JOIN FixEndDateCTE f
ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.StartDate)) = f.FixID
ORDER BY e.StartDate



It is tricky because you need to separate vacations from other rows ( treated differently), create new rows post vacation, and update both the existing and new rows with corrected end dates. Probably far easier to do within a stored procedure where you can use a temp table and take multiple steps, but I tried to do it within a query.
SQLStalker
SQLStalker
Grasshopper
Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)

Group: General Forum Members
Points: 23 Visits: 78
Hi SSC Veteran,

Thank you very much for your response.

Your DDL appears to be working. I am testing further.

The IDs for existing rows can remain the same with new IDs allocated to new rows (after split). No particular order is required.

I really appreciate your suggestion.
SQLStalker
SQLStalker
Grasshopper
Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)

Group: General Forum Members
Points: 23 Visits: 78
Hi All,

Thanks to SSC Veteran's assistance, I am close to the correct result set.

During testing, I found that when 2 rows overlap, duplicate rows appear. An example is:

John Smith, Vacation, 12/02/2014, 25/02/2014
John Smith, Natwest, 14/02/2014, 28/02/2014

The expected result should be the Natwest job start date is the next day after vacation:

John Smith, Vacation, 12/02/2014, 25/02/2014
John Smith, Natwest, 26/02/2014, 28/02/2014

However, I see the following:

John Smith, Vacation, 12/02/2014, 25/02/2014
John Smith, Natwest, 14/02/2014, 28/02/2014
John Smith, Natwest, 26/02/2014, 28/02/2014

Any pointers would be appreciated. Here is the DDL.

set dateformat dmy

if OBJECT_ID('tempdb..#Something') is not null
   drop table #Something
   
create table #Something
(
   ID int,
   Employee varchar(20),
   Job varchar(20),
   StartDate datetime,
   EndDate datetime,
   Workload int
)

insert #Something
select *
from (Values
(1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100)
,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100)
,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100)
,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50)
,(5, 'John Doe', 'Santander', '06/01/2014', '25/01/2014', 50)
,(6, 'John Doe', 'Vacation', '13/01/2014', '17/01/2014', 100)
,(7, 'John Doe', 'Natwest', '14/02/2014', '28/02/2014', 100)
,(8, 'John Doe', 'Vacation', '12/02/2014', '25/02/2014', 100)
) x(a,b,c,d,e,f)
;
WITH VacationCTE AS (--This CTE grabs just the Vacation rows so we can compare and split dates from them
   SELECT ID,
         Employee,
         Job,
         StartDate,
         EndDate,
         Workload
   FROM #Something
   WHERE Job = 'Vacation'
   Wink,
   NewRowsCTE AS ( --This CTE creates just new rows starting after the vacations for each banking job
      SELECT a.ID,
         a.Employee,
         a.Job,
         DATEADD (d,1,b.EndDate) AS StartDate,
         a.EndDate,
         a.Workload
      FROM #Something a
      INNER JOIN VacationCTE b
         ON a.StartDate <= b.EndDate
         AND a.EndDate > b.StartDate
         AND a.EndDate > b.EndDate -- This is needed because if the vacation ends when the project does, there is no split row after
         Wink,
   UnionCTE AS ( -- This CTE merges the new rows with the existing ones
      SELECT ID,
            Employee,
            Job,
            StartDate,
            EndDate,
            Workload
      FROM #Something
      UNION ALL
      SELECT ID,
            Employee,
            Job,
            StartDate,
            EndDate,
            Workload
      FROM NewRowsCTE
      Wink,
   FixEndDateCTE as (
      SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate) AS FixID, min(d.StartDate) as StartDate
      FROM UnionCTE c
      LEFT OUTER JOIN VacationCTE d
         ON c.StartDate < d.StartDate
         AND c.EndDate >= d.StartDate
      WHERE c.Job <> 'Vacation'
      GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate)
      Wink
SELECT row_number() OVER (ORDER BY e.Startdate),
      e.Employee,
      e.Job,
      e.StartDate,
      CASE WHEN f.StartDate IS NULL
         THEN e.EndDate
         ELSE DATEADD (d,-1,f.StartDate)
         END,
      e.Workload
FROM UnionCTE e
LEFT OUTER JOIN FixEndDateCTE f
ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.StartDate)) = f.FixID
ORDER BY e.StartDate


ChrisM@Work
ChrisM@Work
SSCrazy Eights
SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)

Group: General Forum Members
Points: 8957 Visits: 19014
This appears to work:

SELECT 
   ID = ROW_NUMBER() OVER(ORDER BY MIN(DateRange)),
   Employee, Job, [Workload],
   StartDate = MIN(DateRange),
   EndDate = MAX(DateRange)
FROM (
   SELECT ID, Employee, Job, [Workload], DateRange,
      Grouper = DENSE_RANK() over (order by Employee, daterange)
      - DENSE_RANK() over (order by ID, Employee, daterange)

   FROM (
      SELECT
         so.ID,
         so.Employee,
         Job = so.Job,
         so.[Workload],
         c.DateRange,
         rn = DENSE_RANK() OVER(PARTITION BY c.DateRange ORDER BY CASE WHEN so.Job = 'Vacation' THEN 0 ELSE 1 END)
      FROM #Something so
      CROSS APPLY (SELECT RangeStart = MIN(StartDate), RangeEnd = MAX(EndDate) FROM #Something si) r
      CROSS APPLY dbo.IF_Calendar (r.RangeStart,r.RangeEnd,'monday') c
      WHERE c.DateRange BETWEEN so.StartDate AND so.EndDate
   Wink d
   WHERE rn = 1
) q
GROUP BY ID, Employee, Job, [Workload], Grouper



Here's the function definition:
CREATE FUNCTION [dbo].[IF_Calendar] 
(   
   @StartDate DATE,
   @EndDate DATE,
   @FirstWeekDay VARCHAR(10)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
(

   WITH E1(N) AS (
      SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
      SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
      SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
   Wink, --10E+1 or 10 rows
   E2(N) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b), --10E+2 or 100 rows
   E3(N) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b CROSS JOIN E1 c), --1M rows max

   iTally AS ( -- generate sufficient rows to cover startdate to enddate inclusive
      SELECT TOP(1+DATEDIFF(DAY,@StartDate,@EndDate))
         rn = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1
      FROM E3
   Wink

   -- Do some date arithmetic
   SELECT
      a.DateRange,
      c.[Year],
      c.[Month],
      c.[DayOfMonth],
      c.AbsWeekno,
      c.[DayName],
      d.Holiday
   FROM iTally
   CROSS APPLY (SELECT DateRange = DATEADD(day,rn,@StartDate)) a
   CROSS APPLY (VALUES ('Tuesday',1),('Wednesday',2),('Thursday',3),('Friday',4),('Saturday',5),('Sunday',6),('Monday',7)
   Wink b (FirstWeekDay, FirstWeekdayOffset)
   CROSS APPLY (
      SELECT
         [Year] = YEAR(a.DateRange),
         [Month] = MONTH(a.DateRange),
         [DayOfMonth] = DAY(a.DateRange),
         AbsWeekno   = DATEDIFF(day,FirstWeekdayOffset,a.DateRange)/7,
         [DayName]   = DATENAME(weekday,a.DateRange)
   Wink c
   CROSS APPLY (
      SELECT Holiday = CASE
         WHEN [Month] = 1 AND [DayOfMonth] = 1 THEN 'New Year'
         WHEN [Month] = 5 AND [DayOfMonth] >= 25 AND [DayName] = 'Monday' THEN 'Memorial Day'
         WHEN [Month] = 7 AND [DayOfMonth] = 4 THEN 'Independence Day'
         WHEN [Month] = 9 AND [DayOfMonth] <= 7 AND [DayName] = 'Monday' THEN 'Labor Day'
         WHEN [Month] = 11 AND [DayOfMonth] BETWEEN 22 AND 28 AND [DayName] = 'Thursday' THEN 'Thanksgiving Day'
         WHEN [Month] = 12 AND [DayOfMonth] = 25 THEN 'Christmas Day'
         ELSE NULL END
   Wink d
   WHERE b.FirstWeekDay = @FirstWeekDay
      AND @EndDate IS NOT NULL

)


GO



Which you could easily replace with an inline tally table.

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search