Select continual date ranges from a list that overlaps

Question

Select continual date ranges from a list that overlaps

wjh_uk

Old Hand

Points: 354
More actions
September 26, 2016 at 6:47 pm

#317735

Hi There, need some help.
I have list of prices and the dates they are valid from and to. For Example:
Date FromDate ToPrice
20/06/201326/08/2013199.99
27/08/201321/01/2014199.99
3/10/201331/10/2013149.99
2/11/20137/11/2013149.99
25/11/201326/11/2013149.99
27/11/201329/11/2013140
30/11/20135/12/2013149.99
6/12/20136/12/2013140
7/12/201325/12/2013149.99
14/01/201414/01/2014119.99
Some Prices overlap so the actual price that is valid is the lowest price, so for example while the entry for 199.99 is from the 27/08/2013-21/01/2014, when the lower price of 149.99 comes into effect on the 3/10/2013 then 149.99 is the actual price.
What I want to do is to break this down to actual active periods and if possible combine periods where the value is the same, so for the example above the result would be :
Date FromDate ToPrice
20/06/20132/10/2013199.99
3/10/201331/10/2013149.99
1/11/20131/11/2013199.99
2/11/20137/11/2013149.99
8/11/201324/11/2013199.99
25/11/201326/11/2013149.99
27/11/201329/11/2013140
30/11/20135/12/2013149.99
6/12/20136/12/2013140
7/12/201325/12/2013149.99
26/12/201313/01/2014199
14/01/201414/01/2014119.99
15/01/201421/01/2014199.99
I am struggling to come up with a solution after trying for a number of days.
Any suggestions welcomed.
Thanks

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 1

To close to bedtime to code and test, but you should just be able to use the LAG and LEAD operations in a cte to look at the previous [price] and the next [date from] values. Use OVER(ORDER BY [Date From])

Filter out rows where price = lagprice. They don't change the price so they are irrelevant.

[Date To] is recalculated by subtracting one day from the LEAD(Date From]) That way there are no gaps in the date ranges.

Hope this helps.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 2

will you ever have the case where there are "gaps" in your date ranges?

ie...there are dates where you have no price

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Ken McKelvey SSCoach Points: 18767 More actions · Answer 3

One way is to expand the dates with a number/tally table and then roll them up again:

-- *** Consumable Test Data ***

-- Please supply in future with dates in ISO format.

CREATE TABLE #t

(

DateFrom date NOT NULL

,DateTo date NOT NULL

,Price money NOT NULL

);

INSERT INTO #t

VALUES ('20130620', '20130826', 199.99)

,('20130827', '20140121', 199.99)

,('20131003', '20131031', 149.99)

,('20131102', '20131107', 149.99)

,('20131125', '20131126', 149.99)

,('20131127', '20131129', 140.00)

,('20131130', '20131205', 149.99)

,('20131206', '20131206', 140.00)

,('20131207', '20131225', 149.99)

,('20140114', '20140114', 119.99);

-- *** End Test Data ***

WITH Numbers(n)

AS

(

-- 10000 rows

SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)

CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)

CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)

CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)

)

,MinPrices

AS

(

SELECT A.PriceDate, MIN(Price) AS Price

FROM #t T

JOIN Numbers N

ON N.N <= DATEDIFF(day, T.DateFrom, T.DateTo) + 1

CROSS APPLY (VALUES( DATEADD(day, N -1, T.DateFrom)) ) A(PriceDate)

GROUP BY A.PriceDate

)

,Grps

AS

(

SELECT PriceDate, Price

,ROW_NUMBER() OVER (ORDER BY PriceDate)

- ROW_NUMBER() OVER (PARTITION BY Price ORDER BY PriceDate) AS Grp

FROM MinPrices

)

,Results

AS

(

SELECT MIN(PriceDate) AS DateFrom

,MAX(PriceDate) AS DateTo

,Price

,Grp

FROM Grps

GROUP BY Price, Grp

)

SELECT DateFrom, DateTo, Price

FROM Results

ORDER BY DateFrom;

wjh_uk Old Hand Points: 354 More actions · Answer 4

Hi There

No there won't be any gaps, they must be continuous even if they overlap.

Thanks

wjh_uk Old Hand Points: 354 More actions · Answer 5

This works great, thank you so much

Ken McKelvey (9/27/2016)
One way is to expand the dates with a number/tally table and then roll them up again:
-- *** Consumable Test Data ***
-- Please supply in future with dates in ISO format.
CREATE TABLE #t
(
DateFrom date NOT NULL
,DateTo date NOT NULL
,Price money NOT NULL
);
INSERT INTO #t
VALUES ('20130620', '20130826', 199.99)
,('20130827', '20140121', 199.99)
,('20131003', '20131031', 149.99)
,('20131102', '20131107', 149.99)
,('20131125', '20131126', 149.99)
,('20131127', '20131129', 140.00)
,('20131130', '20131205', 149.99)
,('20131206', '20131206', 140.00)
,('20131207', '20131225', 149.99)
,('20140114', '20140114', 119.99);
-- *** End Test Data ***
WITH Numbers(n)
AS
(
-- 10000 rows
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
)
,MinPrices
AS
(
SELECT A.PriceDate, MIN(Price) AS Price
FROM #t T
JOIN Numbers N
ON N.N <= DATEDIFF(day, T.DateFrom, T.DateTo) + 1
CROSS APPLY (VALUES( DATEADD(day, N -1, T.DateFrom)) ) A(PriceDate)
GROUP BY A.PriceDate
)
,Grps
AS
(
SELECT PriceDate, Price
,ROW_NUMBER() OVER (ORDER BY PriceDate)
- ROW_NUMBER() OVER (PARTITION BY Price ORDER BY PriceDate) AS Grp
FROM MinPrices
)
,Results
AS
(
SELECT MIN(PriceDate) AS DateFrom
,MAX(PriceDate) AS DateTo
,Price
,Grp
FROM Grps
GROUP BY Price, Grp
)
SELECT DateFrom, DateTo, Price
FROM Results
ORDER BY DateFrom;

drew.allen SSC Guru Points: 76944 More actions · Answer 6

Ken McKelvey (9/27/2016)
One way is to expand the dates with a number/tally table and then roll them up again:

The problem with this approach is that it can be very expensive, because you're creating a lot of unnecessary records. The following gives the same results, but is much more efficient.

;

WITH packed_dates AS (

SELECT p.price, d.dt, d.is_start,

CASE

WHEN LEAD(d.dt, 1, '9999') OVER(PARTITION BY p.price ORDER BY d.dt, d.is_start) = d.dt THEN 1

WHEN LAG(d.dt, 1, '1900') OVER(PARTITION BY p.price ORDER BY d.dt, d.is_start) = d.dt THEN 1

END AS exclusion_flag

FROM #prices p

CROSS APPLY (VALUES(p.date_from, 1), (DATEADD(DAY, 1, p.date_to), 0)) d(dt, is_start)

)

, date_ranges AS (

SELECT pd.dt AS date_from, LEAD(pd.dt) OVER(ORDER BY pd.dt) AS date_to

FROM packed_dates pd

WHERE pd.exclusion_flag IS NULL

)

SELECT dr.date_from, DATEADD(DAY, -1, dr.date_to) AS date_to, MIN(p.price) AS price

FROM date_ranges dr

INNER JOIN #prices p

ON p.date_from < dr.date_to

AND p.date_to >= dr.date_from

WHERE dr.date_from < dr.date_to

GROUP BY dr.date_from, dr.date_to

;

Here is a comparison of times:

----- Expand the dates -----

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#t__________________________________________________________________________________________________________________0000000E5AA8'. Scan count 1, logical reads 10000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 47 ms, elapsed time = 53 ms.

----- "Unpivot"/Lead/Lag -----

Table '#prices_____________________________________________________________________________________________________________0000000E56A2'. Scan count 2, logical reads 14, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

As you can see, the "expand the dates" approach has only one scan of the table, but 10000 logical reads, whereas mine has two scans, but only 14 logical reads. This manifests in the large difference in the CPU and elapsed times. And that's on a very small subset of data.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Joe Celko SSCertifiable Points: 5749 More actions · Answer 7

No there won't be any gaps, they must be continuous even if they overlap.

Instead of trying to repair the damage after it has been done, you can add constraints and prevent it from happening in the first place. Read this article:

https://www.simple-talk.com/sql/t-sql-programming/contiguous-time-periods/

After this, download a PDF of the Rick Snodgrass book on temporal queries in SQL. It is free from the University of Arizona.

Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL

wjh_uk Old Hand Points: 354 More actions · Answer 8

Thank you so much for code, it is a lot faster but it has some issues that I am struggling to figure out.

With the following data the faster version gets it wrong, but the slower version gets it right, any idea why or how to fix as I have come to rely on the performance:

DateFromDateToPrice

2016-06-272016-11-2849.99

2016-06-279999-12-3049.99

2016-11-299999-12-3039.99

The valid From dates are the same and for some reason both of the 49.99 prices are having the exclusion flag set.

I would appreciate any help

Thanks

drew.allen SSC Guru Points: 76944 More actions · Answer 9

wjh_uk (11/28/2016)
Thank you so much for code, it is a lot faster but it has some issues that I am struggling to figure out.
With the following data the faster version gets it wrong, but the slower version gets it right, any idea why or how to fix as I have come to rely on the performance:
DateFromDateToPrice
2016-06-272016-11-2849.99
2016-06-279999-12-3049.99
2016-11-299999-12-3039.99
The valid From dates are the same and for some reason both of the 49.99 prices are having the exclusion flag set.
I would appreciate any help
Thanks

The algorithm assumes that the combination of Price, dt, and is_start is unique. You need to make sure that this is the case before applying the exclusions. Given that, see if you can modify the query to produce unique combinations of those fields.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

sgmunson SSC Guru Points: 110639 More actions · Answer 10

This isn't going to be as fast as Drew's code, but it should avoid the problems, and be slightly faster than Ken's once you get some volume of data considerably larger than the test data provided:

CREATE TABLE #PRICE_VALID_DATES

(

DateFrom date NOT NULL

,DateTo date NOT NULL

,Price money NOT NULL

);

CREATE NONCLUSTERED INDEX IX_PRICE_VALID_DATES_DateFrom_INCLUDES_REMAINDER ON #PRICE_VALID_DATES

(

DateFrom ASC

)

INCLUDE

(

DateTo, Price

);

CREATE NONCLUSTERED INDEX IX_PRICE_VALID_DATES_DateTo_INCLUDES_REMAINDER ON #PRICE_VALID_DATES

(

DateTo ASC

)

INCLUDE

(

DateFrom, Price

);

CREATE NONCLUSTERED INDEX IX_PRICE_VALID_DATES_Price_INCLUDES_REMAINDER ON #PRICE_VALID_DATES

(

Price ASC

)

INCLUDE

(

DateFrom, DateTo

);

SET NOCOUNT ON;

INSERT INTO #PRICE_VALID_DATES

VALUES ('20130620', '20130826', 199.99)

,('20130827', '20140121', 199.99)

,('20131003', '20131031', 149.99)

,('20131102', '20131107', 149.99)

,('20131125', '20131126', 149.99)

,('20131127', '20131129', 140.00)

,('20131130', '20131205', 149.99)

,('20131206', '20131206', 140.00)

,('20131207', '20131225', 149.99)

,('20140114', '20140114', 119.99);

DECLARE @MIN_DATE AS date = (SELECT MIN(DateFrom) FROM #PRICE_VALID_DATES;

DECLARE @MAX_DATE AS date = (SELECT MAX(DateTo) FROM #PRICE_VALID_DATES;

DECLARE @NUM_DAYS AS bigint = DATEDIFF(day, @MIN_DATE, @MAX_DATE) + 1;

WITH E1 AS (

SELECT 1 AS N UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

),

ALL_DATES AS (

-- ONLY GET AS MANY DATES AS YOU NEED TO COVER THE ENTIRE POSSIBLE RANGE OF VALID DATES

SELECT TOP (@NUM_DAYS) DATEADD(day, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1, @MIN_DATE) AS DAY_DATE

FROM E1 AS a, E1 AS b, E1 AS c, E1 AS d

),

DAILY_PRICES AS (

SELECT D.DAY_DATE, MIN(P.Price) AS Price,

CASE

--DETECT WHEN THE PRICE CHANGES

WHEN LAG(MIN(P.Price), 1, -0.01) OVER(ORDER BY DAY_DATE) <> MIN(P.Price) THEN 1

ELSE 0

END AS IS_CHANGED

FROM #PRICE_VALID_DATES AS P

INNER JOIN ALL_DATES AS D-- THE JOIN CONDITION APPLIES THE PRICE TO ALL THE DATES IN THE GIVEN RANGE

ON D.DAY_DATE BETWEEN P.DateFrom AND P.DateTo

GROUP BY D.DAY_DATE

)

SELECT PCD.DAY_DATE AS DATE_FROM,

--AS RANGES ARE GUARANTEED TO BE CONTINUOUS, YOU JUST NEED THE DAY BEFORE THE NEXT PRICE CHANGE, OR THE MAX DATE VALUE, AS THE END OF A GIVEN RANGE

ISNULL(DATEADD(day, -1, LEAD(PCD.DAY_DATE, 1) OVER(ORDER BY PCD.DAY_DATE)), @MAX_DATE) AS DATE_TO,

PCD.Price

FROM DAILY_PRICES AS PCD

WHERE IS_CHANGED = 1

OR PCD.DAY_DATE = @MIN_DATE

ORDER BY DAY_DATE;

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

drew.allen SSC Guru Points: 76944 More actions · Answer 11

Here is the updated code:

;

WITH unique_dates AS (

SELECT DISTINCT p.Price, d.dt, d.is_start

FROM #prices p

CROSS APPLY (VALUES(p.date_from, 1), (DATEADD(DAY, 1, p.date_to), 0)) d(dt, is_start)

)

, packed_dates AS (

SELECT ud.price, ud.dt, ud.is_start,

CASE

WHEN LEAD(ud.dt, 1, '9999') OVER(PARTITION BY ud.price ORDER BY ud.dt, ud.is_start) = ud.dt THEN 1

WHEN LAG(ud.dt, 1, '1900') OVER(PARTITION BY ud.price ORDER BY ud.dt, ud.is_start) = ud.dt THEN 1

END AS exclusion_flag

FROM unique_dates ud

)

, date_ranges AS (

SELECT pd.dt AS date_from, LEAD(pd.dt) OVER(ORDER BY pd.dt) AS date_to

FROM packed_dates pd

WHERE pd.exclusion_flag IS NULL

)

SELECT dr.date_from, DATEADD(DAY, -1, dr.date_to) AS date_to, MIN(p.price) AS price

FROM date_ranges dr

INNER JOIN #prices p

ON p.date_from < dr.date_to

AND p.date_to >= dr.date_from

WHERE dr.date_from < dr.date_to

GROUP BY dr.date_from, dr.date_to

;

I've split the first CTE into two parts. The first one finds the unique values that I mentioned, the second one sets the exclusion flags.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA