December 20, 2016 at 12:51 pm
I have a data table and a calendar table. I need to fill in the gaps to have a row for each day. I need the dates to be filled in as well as to use the previous valid value for sequence and action. NULL is a valid action type. I've gotten close but the NULLS have been screwing with my results.
I'm using SQL Server 2014. Your help is much appreciated.
Data:
+---------------------------------------------+
| ID | Sequence| Date| Action| |
+---------------------------------------------+
| ---|---------|--------------------|-------| |
| A | 1 | 4/6/2016 12:09:23 | yield| |
| A | 2 | 4/6/2016 12:25:16 | stop| |
| A | 3 | 4/12/2016 11:25:42| NULL| |
| A | 4 | 4/18/2016 11:25:42| go| |
| B | 1 | 2/17/2016 14:15:10| yield| |
| B | 2 | 3/1/2016 7:56:37 | stop| |
| B | 3 | 4/1/2016 9:24:46 | go| |
| B | 4 | 5/4/2016 12:25:16 | exit| |
+---------------------------------------------+
Calendar:
+---------+-------------------------+
| Key | Date |
+---------+-------------------------+
| 2838 | 2016-04-06 00:00:00.000 |
| 2839 | 2016-04-07 00:00:00.000 |
| 2840 | 2016-04-08 00:00:00.000 |
| 2841 | 2016-04-09 00:00:00.000 |
| 2842 | 2016-04-10 00:00:00.000 |
| 2843 | 2016-04-11 00:00:00.000 |
| 2844 | 2016-04-12 00:00:00.000 |
| 2845 | 2016-04-13 00:00:00.000 |
| 2846 | 2016-04-14 00:00:00.000 |
| 2847 | 2016-04-15 00:00:00.000 |
| 2848 | 2016-04-16 00:00:00.000 |
| 2849 | 2016-04-17 00:00:00.000 |
| 2850 | 2016-04-18 00:00:00.000 |
+---------+-------------------------+
Desired results:
ID | Sequence| Date| Action|
----|---------|--------------------|----------|
A | 1| 4/6/2016 12:09:23| yield|
A | 2| 4/6/2016 12:25:16| stop|
A | 2| 4/7/2016 0:00:00| stop|
A | 2| 4/8/2016 0:00:00| stop|
A | 2| 4/9/2016 0:00:00| stop|
A | 2| 4/10/2016 0:00:00| stop|
A | 2| 4/11/2016 0:00:00| stop|
A | 3| 4/12/2016 10:35:34| NULL|
A | 3| 4/13/2016 0:00:00| NULL|
A | 3| 4/14/2016 0:00:00| NULL|
A | 3| 4/15/2016 0:00:00| NULL|
A | 3| 4/16/2016 0:00:00| NULL|
A | 3| 4/17/2016 0:00:00| NULL|
A | 4| 4/18/2016 11:25:4| go|
TSQL to get test tables up:
IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Data')
DROP TABLE Data;
CREATE TABLE Data (ID varchar(20), SEQ INT, Date datetime, Action varchar(20));
INSERT INTO data (ID, SEQ, Date, Action) VALUES
('A', 1, '4/5/2016 12:09:23','yield' ),
('A', 2, '4/7/2016 12:25:16','stop' ),
('A', 3, '4/12/2016 11:25:42',NULL ),
('A', 4, '4/18/2016 11:25:42','go' )
IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Calendar')
DROP TABLE Calendar;
CREATE TABLE Calendar (DateID int, Date datetime);
INSERT INTO Calendar (DateID, Date) VALUES
('2838','4/6/2016 0:00:00'),
('2839','4/7/2016 0:00:00' ),
('2840', '4/8/2016 0:00:00' ),
('2841', '4/9/2016 0:00:00' ),
('2842', '4/10/2016 0:00:00'),
('2843', '4/11/2016 0:00:00' ),
('2844', '4/12/2016 0:00:00' ),
('2845', '4/13/2016 0:00:00' ),
('2846', '4/14/2016 0:00:00' ),
('2847', '4/15/2016 0:00:00'),
('2848', '4/16/2016 0:00:00' ),
('2849', '4/17/2016 0:00:00' ),
('2850', '4/18/2016 0:00:00' )
December 20, 2016 at 1:22 pm
Pretty sure you want NULL and not the string 'NULL'. Not quite seeing what the issue is you are having here though. It is probably right in front me I am just not seeing it.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
December 20, 2016 at 2:03 pm
Your DateKey is defined as INT. Don't provide your sample data as CHAR, because it forces an implicit conversion.
You say that your data contains NULLs which are causing you problems, but your sample data contains 'NULL' rather than NULL. Which does your data actually contain?
Your calendar table start with 2016-04-06 and ends with 2016-04-18, but your data starts with 2016-04-05 12:09:23 and ends with 2016-04-18 11:25:42.
You should provide dates in locale neutral formats (YYYY-MM-DD).
;
WITH data_cte AS (
SELECT *, LEAD(d.Date, 1, '9999-12-30') OVER(PARTITION BY d.ID ORDER BY d.Date) AS next_dt
FROM Data d
)
SELECT d.ID, d.Seq, c.Date, d.Action -- I used c.Date instead of d.Date
FROM data_cte d
INNER JOIN Calendar c
ON CAST(d.Date AS DATE) <= c.Date
AND d.next_dt > c.Date
I used LEAD to find the next date specified and used 9999-12-30 to represent +infinity.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 20, 2016 at 2:57 pm
This is based on an old(ish) Itzik Ben-Gan solution... The Last non NULL Puzzle
IF OBJECT_ID('tempdb..#Data', 'U') IS NOT NULL
DROP TABLE #Data;
CREATE TABLE #Data (
ID VARCHAR(20),
SEQ INT,
Date DATETIME,
Action VARCHAR(20)
);
INSERT INTO #Data (ID, SEQ, Date, Action) VALUES
('A', 1, '4/5/2016 12:09:23','yield' ),
('A', 2, '4/7/2016 12:25:16','stop' ),
('A', 3, '4/12/2016 11:25:42',NULL ), -- made actually null.
('A', 4, '4/18/2016 11:25:42','go' );
IF OBJECT_ID('tempdb..#Calendar', 'U') IS NOT NULL
DROP TABLE #Calendar;
CREATE TABLE #Calendar (DateKey int, Date datetime);
INSERT INTO #Calendar (DateKey, Date) VALUES
('2838','4/6/2016 0:00:00'),
('2839','4/7/2016 0:00:00' ),
('2840', '4/8/2016 0:00:00' ),
('2841', '4/9/2016 0:00:00' ),
('2842', '4/10/2016 0:00:00'),
('2843', '4/11/2016 0:00:00' ),
('2844', '4/12/2016 0:00:00' ),
('2845', '4/13/2016 0:00:00' ),
('2846', '4/14/2016 0:00:00' ),
('2847', '4/15/2016 0:00:00'),
('2848', '4/16/2016 0:00:00' ),
('2849', '4/17/2016 0:00:00' ),
('2850', '4/18/2016 0:00:00' );
--=============================================
WITH
cte_AddCalendar AS (
SELECT
d.ID,
d.SEQ,
Date = ISNULL(d.Date, c.Date),
Action = CASE WHEN d.ID IS NOT NULL AND d.Action IS NULL THEN 'XXX' ELSE d.Action end
FROM
#Data d
FULL JOIN #Calendar c
ON CAST(d.Date AS DATE) = c.Date
AND d.ID = 'A'
)
SELECT
ac.ID,
ac.SEQ,
ac.Date,
Action = NULLIF(ac.Action, 'XXX'),
SmearedValue = CAST(
NULLIF(
SUBSTRING(
MAX(CAST(ac.Date AS BINARY(8)) + CAST(ac.Action AS BINARY(20))) OVER (ORDER BY ac.Date ROWS UNBOUNDED PRECEDING)
, 9, 10)
, 0x5858580000000000000000000000000000000000) -- CAST('XXX' AS BINARY(20))
AS VARCHAR(20)
)
FROM
cte_AddCalendar ac;
Results...
ID SEQ Date Action SmearedValue
-------------------- ----------- ----------------------- -------------------- ------------
A 1 2016-04-05 12:09:23.000 yield yield
NULL NULL 2016-04-06 00:00:00.000 NULL yield
A 2 2016-04-07 12:25:16.000 stop stop
NULL NULL 2016-04-08 00:00:00.000 NULL stop
NULL NULL 2016-04-09 00:00:00.000 NULL stop
NULL NULL 2016-04-10 00:00:00.000 NULL stop
NULL NULL 2016-04-11 00:00:00.000 NULL stop
A 3 2016-04-12 11:25:42.000 NULL NULL
NULL NULL 2016-04-13 00:00:00.000 NULL NULL
NULL NULL 2016-04-14 00:00:00.000 NULL NULL
NULL NULL 2016-04-15 00:00:00.000 NULL NULL
NULL NULL 2016-04-16 00:00:00.000 NULL NULL
NULL NULL 2016-04-17 00:00:00.000 NULL NULL
A 4 2016-04-18 11:25:42.000 go go
December 20, 2016 at 2:59 pm
Thanks for the quick reply Drew. I've updated the post. How would I go about it with multiple actions in the same day?
December 20, 2016 at 3:25 pm
Just another way to tackle it, using CROSS APPLY:
IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Data')
DROP TABLE Data;
CREATE TABLE Data (ID varchar(20), SEQ INT, Date datetime, Action varchar(20));
INSERT INTO Data (ID, SEQ, [Date], [Action])
VALUES('A', 1, '4/5/2016 12:09:23','yield'),
('A', 2, '4/7/2016 12:25:16','stop'),
('A', 3, '4/12/2016 11:25:42','NULL'),
('A', 4, '4/18/2016 11:25:42','go');
IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Calendar')
DROP TABLE Calendar;
CREATE TABLE Calendar (DateKey int, [Date] datetime);
INSERT INTO Calendar (DateKey, [Date])
VALUES(2837, '04/05/2016'),
(2838, '04/06/2016'),
(2839, '04/07/2016'),
(2840, '04/08/2016'),
(2841, '04/09/2016'),
(2842, '04/10/2016'),
(2843, '04/11/2016'),
(2844, '04/12/2016'),
(2845, '04/13/2016'),
(2846, '04/14/2016'),
(2847, '04/15/2016'),
(2848, '04/16/2016'),
(2849, '04/17/2016'),
(2850, '04/18/2016');
WITH LEAD_DATA AS (
SELECT D.*,
CAST(D.[Date] AS date) AS DATE_ONLY,
CASE
WHEN CAST(LEAD(D.[Date], 1, '9999-12-31') OVER(ORDER BY D.[Date]) AS date) = '9999-12-31' THEN '9999-12-31'
ELSE DATEADD(day, -1, CAST(LEAD(D.[Date], 1, '9999-12-31') OVER(ORDER BY D.[Date]) AS date))
END AS LEAD_DATE,
ROW_NUMBER() OVER(ORDER BY D.[Date]) AS RN
FROM Data AS D
)
SELECT D.ID,
D.SEQ,
CASE
WHEN C.[Date] = D.DATE_ONLY THEN D.[Date]
ELSE C.[Date]
END AS [Date],
D.[Action]
FROM LEAD_DATA AS D
CROSS APPLY (
SELECT *
FROM Calendar AS CI
WHERE CI.[Date] >= D.DATE_ONLY
AND CI.[Date] <= D.LEAD_DATE
) AS C
ORDER BY D.RN, C.[Date]
DROP TABLE Calendar;
DROP TABLE Data;
I haven't looked at performance on this, so you'll want to test it for that.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
December 20, 2016 at 3:32 pm
Jason A. Long (12/20/2016)
This is based on an old(ish) Itzik Ben-Gan solution... The Last non NULL Puzzle
IF OBJECT_ID('tempdb..#Data', 'U') IS NOT NULL
DROP TABLE #Data;
CREATE TABLE #Data (
ID VARCHAR(20),
SEQ INT,
Date DATETIME,
Action VARCHAR(20)
);
INSERT INTO #Data (ID, SEQ, Date, Action) VALUES
('A', 1, '4/5/2016 12:09:23','yield' ),
('A', 2, '4/7/2016 12:25:16','stop' ),
('A', 3, '4/12/2016 11:25:42',NULL ), -- made actually null.
('A', 4, '4/18/2016 11:25:42','go' );
IF OBJECT_ID('tempdb..#Calendar', 'U') IS NOT NULL
DROP TABLE #Calendar;
CREATE TABLE #Calendar (DateKey int, Date datetime);
INSERT INTO #Calendar (DateKey, Date) VALUES
('2838','4/6/2016 0:00:00'),
('2839','4/7/2016 0:00:00' ),
('2840', '4/8/2016 0:00:00' ),
('2841', '4/9/2016 0:00:00' ),
('2842', '4/10/2016 0:00:00'),
('2843', '4/11/2016 0:00:00' ),
('2844', '4/12/2016 0:00:00' ),
('2845', '4/13/2016 0:00:00' ),
('2846', '4/14/2016 0:00:00' ),
('2847', '4/15/2016 0:00:00'),
('2848', '4/16/2016 0:00:00' ),
('2849', '4/17/2016 0:00:00' ),
('2850', '4/18/2016 0:00:00' );
--=============================================
WITH
cte_AddCalendar AS (
SELECT
d.ID,
d.SEQ,
Date = ISNULL(d.Date, c.Date),
Action = CASE WHEN d.ID IS NOT NULL AND d.Action IS NULL THEN 'XXX' ELSE d.Action end
FROM
#Data d
FULL JOIN #Calendar c
ON CAST(d.Date AS DATE) = c.Date
AND d.ID = 'A'
)
SELECT
ac.ID,
ac.SEQ,
ac.Date,
Action = NULLIF(ac.Action, 'XXX'),
SmearedValue = CAST(
NULLIF(
SUBSTRING(
MAX(CAST(ac.Date AS BINARY(8)) + CAST(ac.Action AS BINARY(20))) OVER (ORDER BY ac.Date ROWS UNBOUNDED PRECEDING)
, 9, 10)
, 0x5858580000000000000000000000000000000000) -- CAST('XXX' AS BINARY(20))
AS VARCHAR(20)
)
FROM
cte_AddCalendar ac;
Results...
ID SEQ Date Action SmearedValue
-------------------- ----------- ----------------------- -------------------- ------------
A 1 2016-04-05 12:09:23.000 yield yield
NULL NULL 2016-04-06 00:00:00.000 NULL yield
A 2 2016-04-07 12:25:16.000 stop stop
NULL NULL 2016-04-08 00:00:00.000 NULL stop
NULL NULL 2016-04-09 00:00:00.000 NULL stop
NULL NULL 2016-04-10 00:00:00.000 NULL stop
NULL NULL 2016-04-11 00:00:00.000 NULL stop
A 3 2016-04-12 11:25:42.000 NULL NULL
NULL NULL 2016-04-13 00:00:00.000 NULL NULL
NULL NULL 2016-04-14 00:00:00.000 NULL NULL
NULL NULL 2016-04-15 00:00:00.000 NULL NULL
NULL NULL 2016-04-16 00:00:00.000 NULL NULL
NULL NULL 2016-04-17 00:00:00.000 NULL NULL
A 4 2016-04-18 11:25:42.000 go go
I started using a similar approach, but the ID isn't smeared and you would run into problems when there is more than one ID if you try to smear the ID as well.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 20, 2016 at 3:53 pm
sgmunson (12/20/2016)
Just another way to tackle it, using CROSS APPLY:
-- code snipped to pinpoint this section
SELECT D.*,
CAST(D.[Date] AS date) AS DATE_ONLY,
CASE
WHEN CAST(LEAD(D.[Date], 1, '9999-12-31') OVER(ORDER BY D.[Date]) AS date) = '9999-12-31' THEN '9999-12-31'
ELSE DATEADD(day, -1, CAST(LEAD(D.[Date], 1, '9999-12-31') OVER(ORDER BY D.[Date]) AS date))
END AS LEAD_DATE,
ROW_NUMBER() OVER(ORDER BY D.[Date]) AS RN
FROM Data AS D
You don't need that complicated CASE expression.
SELECT D.*,
CAST(D.[Date] AS date) AS DATE_ONLY,
-- Replace the CASE here
LEAD(DATEADD(DAY, -1, CAST(D.[Date] AS DATE)), 1, '9999-12-31') OVER(ORDER BY D.[Date]) AS LEAD_DATE,
ROW_NUMBER() OVER(ORDER BY D.[Date]) AS RN
FROM #DATA AS D
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 20, 2016 at 4:24 pm
drew.allen (12/20/2016)
I started using a similar approach, but the ID isn't smeared and you would run into problems when there is more than one ID if you try to smear the ID as well.Drew
All valid points... All solvable with a slight modification though.
This should do the trick...
IF OBJECT_ID('tempdb..#Data', 'U') IS NOT NULL
DROP TABLE #Data;
CREATE TABLE #Data (
ID VARCHAR(20),
SEQ INT,
Date DATETIME,
Action VARCHAR(20)
);
INSERT INTO #Data (ID, SEQ, Date, Action) VALUES
('A', 1, '4/5/2016 12:09:23','yield' ),
('A', 2, '4/7/2016 12:25:16','stop' ),
('A', 3, '4/12/2016 11:25:42',NULL ), -- made actually null.
('A', 4, '4/18/2016 11:25:42','go' ),
('B', 1, '4/5/2016 12:09:23','yield' ),
('B', 2, '4/7/2016 12:25:16','stop' ),
('B', 3, '4/12/2016 11:25:42',NULL ), -- made actually null.
('B', 4, '4/18/2016 11:25:42','go' );
IF OBJECT_ID('tempdb..#Calendar', 'U') IS NOT NULL
DROP TABLE #Calendar;
CREATE TABLE #Calendar (DateKey int, Date datetime);
INSERT INTO #Calendar (DateKey, Date) VALUES
('1234','4/5/2016 0:00:00'),
('2838','4/6/2016 0:00:00'),
('2839','4/7/2016 0:00:00' ),
('2840', '4/8/2016 0:00:00' ),
('2841', '4/9/2016 0:00:00' ),
('2842', '4/10/2016 0:00:00'),
('2843', '4/11/2016 0:00:00' ),
('2844', '4/12/2016 0:00:00' ),
('2845', '4/13/2016 0:00:00' ),
('2846', '4/14/2016 0:00:00' ),
('2847', '4/15/2016 0:00:00'),
('2848', '4/16/2016 0:00:00' ),
('2849', '4/17/2016 0:00:00' ),
('2850', '4/18/2016 0:00:00' );
--=============================================
WITH
cte_DistinctID AS ( SELECT DISTINCT d.ID FROM #Data d ),
cte_ApplyCalendarToID AS (
SELECT
di.ID,
c.Date
FROM
cte_DistinctID di
CROSS JOIN #Calendar c
),
cte_AddCalendar AS (
SELECT
aci.ID,
d.SEQ,
Date = ISNULL(d.Date, aci.Date),
Action = CASE WHEN d.ID IS NOT NULL AND d.Action IS NULL THEN 'XXX' ELSE d.Action end
FROM
#Data d
FULL JOIN cte_ApplyCalendarToID aci
ON d.ID = aci.ID
AND CAST(d.Date AS DATE) = aci.Date
)
SELECT
ac.ID,
ac.SEQ,
ac.Date,
Action = NULLIF(ac.Action, 'XXX'),
SmearedValue = CAST(
NULLIF(
SUBSTRING(
MAX(CAST(ac.Date AS BINARY(8)) + CAST(ac.Action AS BINARY(20))) OVER (PARTITION BY ac.ID ORDER BY ac.Date ROWS UNBOUNDED PRECEDING)
, 9, 10)
, 0x5858580000000000000000000000000000000000) -- CAST('XXX' AS BINARY(20))
AS VARCHAR(20)
)
FROM
cte_AddCalendar ac;
Results...
ID SEQ Date Action SmearedValue
-------------------- ----------- ----------------------- -------------------- --------------------
A 1 2016-04-05 12:09:23.000 yield yield
A NULL 2016-04-06 00:00:00.000 NULL yield
A 2 2016-04-07 12:25:16.000 stop stop
A NULL 2016-04-08 00:00:00.000 NULL stop
A NULL 2016-04-09 00:00:00.000 NULL stop
A NULL 2016-04-10 00:00:00.000 NULL stop
A NULL 2016-04-11 00:00:00.000 NULL stop
A 3 2016-04-12 11:25:42.000 NULL NULL
A NULL 2016-04-13 00:00:00.000 NULL NULL
A NULL 2016-04-14 00:00:00.000 NULL NULL
A NULL 2016-04-15 00:00:00.000 NULL NULL
A NULL 2016-04-16 00:00:00.000 NULL NULL
A NULL 2016-04-17 00:00:00.000 NULL NULL
A 4 2016-04-18 11:25:42.000 go go
B 1 2016-04-05 12:09:23.000 yield yield
B NULL 2016-04-06 00:00:00.000 NULL yield
B 2 2016-04-07 12:25:16.000 stop stop
B NULL 2016-04-08 00:00:00.000 NULL stop
B NULL 2016-04-09 00:00:00.000 NULL stop
B NULL 2016-04-10 00:00:00.000 NULL stop
B NULL 2016-04-11 00:00:00.000 NULL stop
B 3 2016-04-12 11:25:42.000 NULL NULL
B NULL 2016-04-13 00:00:00.000 NULL NULL
B NULL 2016-04-14 00:00:00.000 NULL NULL
B NULL 2016-04-15 00:00:00.000 NULL NULL
B NULL 2016-04-16 00:00:00.000 NULL NULL
B NULL 2016-04-17 00:00:00.000 NULL NULL
B 4 2016-04-18 11:25:42.000 go go
December 21, 2016 at 7:05 am
DK13 (12/20/2016)
NULL is a valid action type.
I'll save Celko the effort here. NULL is not a valid action type. It is messing up your results because you are assigning meaning to it.
Implying meaning with null is a slippery slope and it will affect your data quality and performance later. If you're intending it to mean nothing happened, then use 'None' instead. Null means you don't know what happened or even if anything happened.
Improper Null handling is a big source of bugs and will make your job (or those after you) harder. As an example of that, if you didn't use Null as an action type, you likely would not have needed to ask your question here.
Save yourself while there is still time, replace the null with a meaningful value.
Wes
(A solid design is always preferable to a creative workaround)
December 21, 2016 at 7:22 am
Okay folks, having realized that my query was only going to handle one ID value, I began to look a little deeper into this. I won't necessarily worry about whenriksen's point about NULL values, as there's no guarantee that assigning a meaning to NULL is automatically a bad thing. It's not necessarily best, but not always bad either. In any case, having set myself on the path to solving for multiple ID values, I took the poster's original data and made a slight modification of my query to deal with it. Take a look and feel free to critique:
IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Data')
DROP TABLE Data;
CREATE TABLE dbo.Data (ID varchar(20), SEQ INT, [Date] datetime, [Action] varchar(20));
INSERT INTO Data (ID, SEQ, [Date], [Action])
VALUES('A', 1, '4/5/2016 12:09:23','yield'),
('A', 2, '4/7/2016 12:25:16','stop'),
('A', 3, '4/12/2016 11:25:42',NULL),
('A', 4, '4/18/2016 11:25:42','go'),
('B', 1, '2/17/2016 14:15:10','yield'),
('B', 2, '3/1/2016 07:56:37', 'stop'),
('B', 3, '4/1/2016 09:24:46', 'go'),
('B', 4, '5/4/2016 12:25:16', 'exit');
WITH DATA_MAX AS (
SELECT D.ID, MIN(CAST(D.[Date] AS date)) AS MIN_DATE, MAX(CAST(D.[Date] AS date)) AS MAX_DATE
FROM dbo.Data AS D
GROUP BY D.ID
),
TALLY AS (
SELECT TOP (1000) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1 AS RN
FROM (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS X(N),
(VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS Y(N),
(VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS Z(N)
),
DATE_RANGES AS (
SELECT TOP 100 PERCENT DM.ID, DATEADD(day, T.RN, DM.MIN_DATE) AS THE_DATE
FROM DATA_MAX AS DM
CROSS APPLY TALLY AS T
WHERE DATEADD(day, T.RN, DM.MIN_DATE) <= DM.MAX_DATE
ORDER BY DM.ID, THE_DATE
),
LEAD_DATA AS (
SELECT D.*,
CAST(D.[Date] AS date) AS DATE_ONLY,
CASE
WHEN CAST(D.[Date] AS date) = DM.MAX_DATE THEN DM.MAX_DATE
ELSE DATEADD(day, -1, CAST(LEAD(D.[Date], 1, DM.MAX_DATE) OVER(PARTITION BY D.ID ORDER BY D.[Date]) AS date))
END AS LEAD_DATE,
ROW_NUMBER() OVER(PARTITION BY D.ID ORDER BY D.[Date]) AS RN
FROM Data AS D
INNER JOIN DATA_MAX AS DM
ON D.ID = DM.ID
)
SELECT LD.ID,
LD.SEQ,
CASE
WHEN CAST(LD.[Date] AS date) = DR.THE_DATE THEN LD.[Date]
ELSE DR.THE_DATE
END AS [Date],
LD.[Action]
FROM LEAD_DATA AS LD
CROSS APPLY DATE_RANGES AS DR
WHERE LD.ID = DR.ID
AND DR.THE_DATE BETWEEN LD.DATE_ONLY AND LD.LEAD_DATE
ORDER BY LD.ID, LD.SEQ, DR.THE_DATE;
DROP TABLE Data;
EDIT: TOP (100) changed to TOP (1000).
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
December 21, 2016 at 7:40 am
And now that I've seen the post about needing to deal with the possibility of more than one action in a given day, I've decided to tackle that one. Here's the code:
IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Data')
DROP TABLE Data;
CREATE TABLE dbo.Data (ID varchar(20), SEQ INT, [Date] datetime, [Action] varchar(20));
INSERT INTO Data (ID, SEQ, [Date], [Action])
VALUES('A', 1, '4/5/2016 12:09:23', 'yield'),
('A', 2, '4/7/2016 12:25:16', 'stop'),
('A', 3, '4/12/2016 11:25:42', NULL),
('A', 4, '4/18/2016 11:25:42', 'go'),
('B', 1, '2/17/2016 14:15:10', 'yield'),
('B', 2, '3/1/2016 07:56:37', 'stop'),
('B', 3, '4/1/2016 09:24:46', 'go'),
('B', 4, '5/4/2016 12:25:16', 'exit'),
('C', 1, '3/1/2016 04:15:10', 'yield'),
('C', 2, '3/1/2016 07:56:37', 'stop'),
('C', 3, '4/1/2016 09:24:46', 'go'),
('C', 4, '4/1/2016 19:22:05', 'walk'),
('C', 5, '5/4/2016 12:25:16', 'exit');
WITH DATA_MAX AS (
SELECT D.ID, MIN(CAST(D.[Date] AS date)) AS MIN_DATE, MAX(CAST(D.[Date] AS date)) AS MAX_DATE
FROM dbo.Data AS D
GROUP BY D.ID
),
TALLY AS (
SELECT TOP (1000) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1 AS RN
FROM (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS X(N),
(VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS Y(N),
(VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) AS Z(N)
),
DATE_RANGES AS (
SELECT TOP 100 PERCENT DM.ID, DATEADD(day, T.RN, DM.MIN_DATE) AS THE_DATE
FROM DATA_MAX AS DM
CROSS APPLY TALLY AS T
WHERE DATEADD(day, T.RN, DM.MIN_DATE) <= DM.MAX_DATE
ORDER BY DM.ID, THE_DATE
),
LEAD_DATA AS (
SELECT D.*,
CAST(D.[Date] AS date) AS DATE_ONLY,
CASE
WHEN CAST(D.[Date] AS date) = DM.MIN_DATE THEN DM.MIN_DATE
WHEN CAST(D.[Date] AS date) = CAST(LEAD(D.[Date], 1, DM.MAX_DATE) OVER(PARTITION BY D.ID ORDER BY D.[Date]) AS date) THEN CAST(D.[Date] AS date)
WHEN CAST(D.[Date] AS date) = DM.MAX_DATE THEN DM.MAX_DATE
ELSE DATEADD(day, -1, CAST(LEAD(D.[Date], 1, DM.MAX_DATE) OVER(PARTITION BY D.ID ORDER BY D.[Date]) AS date))
END AS LEAD_DATE,
ROW_NUMBER() OVER(PARTITION BY D.ID ORDER BY D.[Date]) AS RN
FROM Data AS D
INNER JOIN DATA_MAX AS DM
ON D.ID = DM.ID
)
SELECT LD.ID,
LD.SEQ,
CASE
WHEN CAST(LD.[Date] AS date) = DR.THE_DATE THEN LD.[Date]
ELSE DR.THE_DATE
END AS [Date],
LD.[Action]
FROM LEAD_DATA AS LD
CROSS APPLY DATE_RANGES AS DR
WHERE LD.ID = DR.ID
AND DR.THE_DATE BETWEEN LD.DATE_ONLY AND LD.LEAD_DATE
ORDER BY LD.ID, LD.SEQ, DR.THE_DATE;
DROP TABLE Data;
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
December 21, 2016 at 10:39 am
whenriksen (12/21/2016)
DK13 (12/20/2016)
NULL is a valid action type.I'll save Celko the effort here. NULL is not a valid action type. It is messing up your results because you are assigning meaning to it.
Implying meaning with null is a slippery slope and it will affect your data quality and performance later. If you're intending it to mean nothing happened, then use 'None' instead. Null means you don't know what happened or even if anything happened.
Improper Null handling is a big source of bugs and will make your job (or those after you) harder. As an example of that, if you didn't use Null as an action type, you likely would not have needed to ask your question here.
Save yourself while there is still time, replace the null with a meaningful value.
Just to follow up on this, relational theory actually specifies two different types of missing data, unknown but applicable and unknown and inapplicable. Unfortunately, SQL Server only implements one, so when you need to distinguish between the two, the standard practice is to use an "impossible" value for unknown but applicable. That is essentially what whenriksen is suggesting here.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
December 22, 2016 at 4:41 pm
Thank you everyone! The scope of the responses were a bit above my experience but I managed to use them to get a working solution. And needless to say I learned a lot here :satisfied:
December 22, 2016 at 4:46 pm
DK13 (12/22/2016)
Thank you everyone! The scope of the responses were a bit above my experience but I managed to use them to get a working solution. And needless to say I learned a lot here :satisfied:
Glad to help out. It's a good idea to look over the various responses and mark those that contributed to your solution as "This worked". It helps the rest of us know how we did. You can designate more than one post.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply