Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

SQL to split row by date (split into multiple rows) Expand / Collapse
Author
Message
Posted Wednesday, June 12, 2013 3:42 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, October 6, 2013 9:53 AM
Points: 9, Visits: 78
I am looking for help with splitting a row into multiple rows based on dates overlapping.

As an example, I have a table with the following data:

Row ID, Employee, Job, Start Date, End Date, Workload
1, John Doe, HSBC, 01/01/2013, 31/12/2013, 100
2, John Doe, Vacation, 17/06/2013, 21/06/2013, 100
3, John Doe, Vacation, 19/08/2013, 23/08/2013, 100
4, John Doe, Barclays, 01/01/2014, 31/01/2014, 50
5, John Doe, Santander, 06/01/2014, 25/01/2014, 50
6, John Doe, Vacation, 13/01/2014, 17/01/2014, 100

I am looking to split the banking rows where they overlap with a vacation. So for example, the final result should be:

Row ID, Employee, Job, Start Date, End Date, Workload
1, John Doe, HSBC, 01/01/2013, 16/06/2013, 100
2, John Doe, Vacation, 17/06/2013, 21/06/2013, 100
3, John Doe, HSBC, 22/06/2013, 18/08/2013, 100
4, John Doe, Vacation, 19/08/2013, 23/08/2013, 100
5, John Doe, HSBC, 24/08/2013, 31/12/2013, 100
6, John Doe, Barclays, 01/01/2014, 12/01/2014, 50
7, John Doe, Barclays, 18/01/2014, 31/01/2014, 50
8, John Doe, Santander, 06/01/2014, 12/01/2014, 50
9, John Doe, Santander, 18/01/2014, 25/01/2014, 50
10, John Doe, Vacation, 13/01/2014, 17/01/2014, 100

New rows after split are in bold. The Row ID should be unique although it doesn't need to be sequential.

Any help or guidance would be appreciated.
Post #1462522
Posted Wednesday, June 12, 2013 9:42 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, August 29, 2014 5:21 AM
Points: 197, Visits: 730
hi,

I tried to understand your example but I didn't.
I have no idea how you get to the date values for the added rows. And what about the first row in your example, the date values have changed there too.
Please try to explain it again, maybe more detailled.

By the way, please read the following article and provide your example data a little bit more comfortable
http://www.sqlservercentral.com/articles/Best+Practices/61537/
Post #1462704
Posted Wednesday, June 12, 2013 10:05 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 3:41 PM
Points: 13,315, Visits: 12,182
I understand your requirements. This is a bit challenging. Since you are brand new around here I went ahead and put together some ddl and sample data in a consumable format so you can see how you should post this type of thing.

Here it is:

set dateformat dmy

if OBJECT_ID('tempdb..#Something') is not null
drop table #Something

create table #Something
(
ID int,
Employee varchar(20),
Job varchar(20),
StartDate datetime,
EndDate datetime,
Workload int
)

insert #Something
select *
from (Values
(1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100)
,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100)
,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100)
,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50)
,(5, 'John Doe', 'Santander', '06/01/2014', '25/01/2014', 50)
,(6, 'John Doe', 'Vacation', '13/01/2014', '17/01/2014', 100)
) x(a,b,c,d,e,f)

select *
from #Something

So now we all have something to work with. I will see if I can come up with something that will work. Meantime others can work on this too now that they have details to work with.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1462708
Posted Wednesday, June 12, 2013 10:37 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, October 6, 2013 9:53 AM
Points: 9, Visits: 78
Hi WolfgangE and Sean Lange,

Thank you for your responses.

WolfgangE, sorry should have been more clear in my requirements.

We have a system where an employee can be booked on a job for long periods, so in my crude example, John Doe is booked to HSBC for all of 2013. However, in between, employees can be on vacation too so John Doe has 2 vacations during the time he is booked on HSBC.

What we are looking to achieve is to split the job either side of vacation bookings so it all slots in as opposed to being stacked. I attach a screenshot as an example.

Sean Lange, thanks for the DDL. I should have provided that in the first place.

I look forward to your guidance and assistance.
Post #1462724
Posted Wednesday, June 12, 2013 11:47 AM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Wednesday, August 27, 2014 12:03 PM
Points: 621, Visits: 2,127
Tricky.

Could you clarify your first sample output from above? With HSBC you intersperse the vacation with the new rows of HSBC. With the other banks you show the vacation AFTER the banking records.

I am pretty close to a solution (its not very pretty), but assuming the output order matters I need to understand the sorting rules.
Post #1462757
Posted Wednesday, June 12, 2013 11:59 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, August 29, 2014 5:21 AM
Points: 197, Visits: 730
thank goodness someone else tries, I dont' find a solution yet...
Post #1462768
Posted Wednesday, June 12, 2013 12:34 PM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Wednesday, August 27, 2014 12:03 PM
Points: 621, Visits: 2,127
Here is what I have. Sort order is not the same as the original post suggested, so we could still use a clairification on those rules.

There is likely a cleaner way to do this, but this is my first pass. Note that it can get tripped up if a job and a vacation start on the same date, so if that is an issue let me know.

set dateformat dmy

if OBJECT_ID('tempdb..#Something') is not null
drop table #Something

create table #Something
(
ID int,
Employee varchar(20),
Job varchar(20),
StartDate datetime,
EndDate datetime,
Workload int
)

insert #Something
select *
from (Values
(1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100)
,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100)
,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100)
,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50)
,(5, 'John Doe', 'Santander', '06/01/2014', '25/01/2014', 50)
,(6, 'John Doe', 'Vacation', '13/01/2014', '17/01/2014', 100)
) x(a,b,c,d,e,f)
;
WITH VacationCTE AS (--This CTE grabs just the Vacation rows so we can compare and split dates from them
SELECT ID,
Employee,
Job,
StartDate,
EndDate,
Workload
FROM #Something
WHERE Job = 'Vacation'
),
NewRowsCTE AS ( --This CTE creates just new rows starting after the vacations for each banking job
SELECT a.ID,
a.Employee,
a.Job,
DATEADD (d,1,b.EndDate) AS StartDate,
a.EndDate,
a.Workload
FROM #Something a
INNER JOIN VacationCTE b
ON a.StartDate <= b.StartDate
AND a.EndDate > b.StartDate
AND a.EndDate > b.EndDate -- This is needed because if the vacation ends when the project does, there is no split row after
),
UnionCTE AS ( -- This CTE merges the new rows with the existing ones
SELECT ID,
Employee,
Job,
StartDate,
EndDate,
Workload
FROM #Something
UNION ALL
SELECT ID,
Employee,
Job,
StartDate,
EndDate,
Workload
FROM NewRowsCTE
),
FixEndDateCTE as (
SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate) AS FixID, min(d.StartDate) as StartDate
FROM UnionCTE c
LEFT OUTER JOIN VacationCTE d
ON c.StartDate < d.StartDate
AND c.EndDate >= d.StartDate
WHERE c.Job <> 'Vacation'
GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate)
)
SELECT row_number() OVER (ORDER BY e.Startdate),
e.Employee,
e.Job,
e.StartDate,
CASE WHEN f.StartDate IS NULL
THEN e.EndDate
ELSE DATEADD (d,-1,f.StartDate)
END,
e.Workload
FROM UnionCTE e
LEFT OUTER JOIN FixEndDateCTE f
ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.StartDate)) = f.FixID
ORDER BY e.StartDate

It is tricky because you need to separate vacations from other rows ( treated differently), create new rows post vacation, and update both the existing and new rows with corrected end dates. Probably far easier to do within a stored procedure where you can use a temp table and take multiple steps, but I tried to do it within a query.
Post #1462779
Posted Thursday, June 13, 2013 8:39 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, October 6, 2013 9:53 AM
Points: 9, Visits: 78
Hi SSC Veteran,

Thank you very much for your response.

Your DDL appears to be working. I am testing further.

The IDs for existing rows can remain the same with new IDs allocated to new rows (after split). No particular order is required.

I really appreciate your suggestion.
Post #1463129
Posted Tuesday, June 18, 2013 2:49 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, October 6, 2013 9:53 AM
Points: 9, Visits: 78
Hi All,

Thanks to SSC Veteran's assistance, I am close to the correct result set.

During testing, I found that when 2 rows overlap, duplicate rows appear. An example is:

John Smith, Vacation, 12/02/2014, 25/02/2014
John Smith, Natwest, 14/02/2014, 28/02/2014

The expected result should be the Natwest job start date is the next day after vacation:

John Smith, Vacation, 12/02/2014, 25/02/2014
John Smith, Natwest, 26/02/2014, 28/02/2014

However, I see the following:

John Smith, Vacation, 12/02/2014, 25/02/2014
John Smith, Natwest, 14/02/2014, 28/02/2014
John Smith, Natwest, 26/02/2014, 28/02/2014

Any pointers would be appreciated. Here is the DDL.

set dateformat dmy

if OBJECT_ID('tempdb..#Something') is not null
drop table #Something

create table #Something
(
ID int,
Employee varchar(20),
Job varchar(20),
StartDate datetime,
EndDate datetime,
Workload int
)

insert #Something
select *
from (Values
(1, 'John Doe', 'HSBC', '01/01/2013', '31/12/2013', 100)
,(2, 'John Doe', 'Vacation', '17/06/2013', '21/06/2013', 100)
,(3, 'John Doe', 'Vacation', '19/08/2013', '23/08/2013', 100)
,(4, 'John Doe', 'Barclays', '01/01/2014', '31/01/2014', 50)
,(5, 'John Doe', 'Santander', '06/01/2014', '25/01/2014', 50)
,(6, 'John Doe', 'Vacation', '13/01/2014', '17/01/2014', 100)
,(7, 'John Doe', 'Natwest', '14/02/2014', '28/02/2014', 100)
,(8, 'John Doe', 'Vacation', '12/02/2014', '25/02/2014', 100)
) x(a,b,c,d,e,f)
;
WITH VacationCTE AS (--This CTE grabs just the Vacation rows so we can compare and split dates from them
SELECT ID,
Employee,
Job,
StartDate,
EndDate,
Workload
FROM #Something
WHERE Job = 'Vacation'
),
NewRowsCTE AS ( --This CTE creates just new rows starting after the vacations for each banking job
SELECT a.ID,
a.Employee,
a.Job,
DATEADD (d,1,b.EndDate) AS StartDate,
a.EndDate,
a.Workload
FROM #Something a
INNER JOIN VacationCTE b
ON a.StartDate <= b.EndDate
AND a.EndDate > b.StartDate
AND a.EndDate > b.EndDate -- This is needed because if the vacation ends when the project does, there is no split row after
),
UnionCTE AS ( -- This CTE merges the new rows with the existing ones
SELECT ID,
Employee,
Job,
StartDate,
EndDate,
Workload
FROM #Something
UNION ALL
SELECT ID,
Employee,
Job,
StartDate,
EndDate,
Workload
FROM NewRowsCTE
),
FixEndDateCTE as (
SELECT CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate) AS FixID, min(d.StartDate) as StartDate
FROM UnionCTE c
LEFT OUTER JOIN VacationCTE d
ON c.StartDate < d.StartDate
AND c.EndDate >= d.StartDate
WHERE c.Job <> 'Vacation'
GROUP BY CONVERT (CHAR,c.ID)+CONVERT (CHAR,c.StartDate)
)
SELECT row_number() OVER (ORDER BY e.Startdate),
e.Employee,
e.Job,
e.StartDate,
CASE WHEN f.StartDate IS NULL
THEN e.EndDate
ELSE DATEADD (d,-1,f.StartDate)
END,
e.Workload
FROM UnionCTE e
LEFT OUTER JOIN FixEndDateCTE f
ON (CONVERT (CHAR,e.ID)+CONVERT (CHAR,e.StartDate)) = f.FixID
ORDER BY e.StartDate

Post #1464508
Posted Tuesday, June 18, 2013 6:17 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 9:49 AM
Points: 7,234, Visits: 13,719
This appears to work:

SELECT 
ID = ROW_NUMBER() OVER(ORDER BY MIN(DateRange)),
Employee, Job, [Workload],
StartDate = MIN(DateRange),
EndDate = MAX(DateRange)
FROM (
SELECT ID, Employee, Job, [Workload], DateRange,
Grouper = DENSE_RANK() over (order by Employee, daterange)
- DENSE_RANK() over (order by ID, Employee, daterange)

FROM (
SELECT
so.ID,
so.Employee,
Job = so.Job,
so.[Workload],
c.DateRange,
rn = DENSE_RANK() OVER(PARTITION BY c.DateRange ORDER BY CASE WHEN so.Job = 'Vacation' THEN 0 ELSE 1 END)
FROM #Something so
CROSS APPLY (SELECT RangeStart = MIN(StartDate), RangeEnd = MAX(EndDate) FROM #Something si) r
CROSS APPLY dbo.IF_Calendar (r.RangeStart,r.RangeEnd,'monday') c
WHERE c.DateRange BETWEEN so.StartDate AND so.EndDate
) d
WHERE rn = 1
) q
GROUP BY ID, Employee, Job, [Workload], Grouper

Here's the function definition:
CREATE FUNCTION [dbo].[IF_Calendar] 
(
@StartDate DATE,
@EndDate DATE,
@FirstWeekDay VARCHAR(10)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
(

WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b), --10E+2 or 100 rows
E3(N) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b CROSS JOIN E1 c), --1M rows max

iTally AS ( -- generate sufficient rows to cover startdate to enddate inclusive
SELECT TOP(1+DATEDIFF(DAY,@StartDate,@EndDate))
rn = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1
FROM E3
)

-- Do some date arithmetic
SELECT
a.DateRange,
c.[Year],
c.[Month],
c.[DayOfMonth],
c.AbsWeekno,
c.[DayName],
d.Holiday
FROM iTally
CROSS APPLY (SELECT DateRange = DATEADD(day,rn,@StartDate)) a
CROSS APPLY (VALUES ('Tuesday',1),('Wednesday',2),('Thursday',3),('Friday',4),('Saturday',5),('Sunday',6),('Monday',7)
) b (FirstWeekDay, FirstWeekdayOffset)
CROSS APPLY (
SELECT
[Year] = YEAR(a.DateRange),
[Month] = MONTH(a.DateRange),
[DayOfMonth] = DAY(a.DateRange),
AbsWeekno = DATEDIFF(day,FirstWeekdayOffset,a.DateRange)/7,
[DayName] = DATENAME(weekday,a.DateRange)
) c
CROSS APPLY (
SELECT Holiday = CASE
WHEN [Month] = 1 AND [DayOfMonth] = 1 THEN 'New Year'
WHEN [Month] = 5 AND [DayOfMonth] >= 25 AND [DayName] = 'Monday' THEN 'Memorial Day'
WHEN [Month] = 7 AND [DayOfMonth] = 4 THEN 'Independence Day'
WHEN [Month] = 9 AND [DayOfMonth] <= 7 AND [DayName] = 'Monday' THEN 'Labor Day'
WHEN [Month] = 11 AND [DayOfMonth] BETWEEN 22 AND 28 AND [DayName] = 'Thursday' THEN 'Thanksgiving Day'
WHEN [Month] = 12 AND [DayOfMonth] = 25 THEN 'Christmas Day'
ELSE NULL END
) d
WHERE b.FirstWeekDay = @FirstWeekDay
AND @EndDate IS NOT NULL

)


GO

Which you could easily replace with an inline tally table.


“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1464605
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse