Select continual date ranges from a list that overlaps

  • Hi There, need some help.

    I have list of prices and the dates they are valid from and to. For Example:

    Date FromDate ToPrice

    20/06/201326/08/2013199.99

    27/08/201321/01/2014199.99

    3/10/201331/10/2013149.99

    2/11/20137/11/2013149.99

    25/11/201326/11/2013149.99

    27/11/201329/11/2013140

    30/11/20135/12/2013149.99

    6/12/20136/12/2013140

    7/12/201325/12/2013149.99

    14/01/201414/01/2014119.99

    Some Prices overlap so the actual price that is valid is the lowest price, so for example while the entry for 199.99 is from the 27/08/2013-21/01/2014, when the lower price of 149.99 comes into effect on the 3/10/2013 then 149.99 is the actual price.

    What I want to do is to break this down to actual active periods and if possible combine periods where the value is the same, so for the example above the result would be :

    Date FromDate ToPrice

    20/06/20132/10/2013199.99

    3/10/201331/10/2013149.99

    1/11/20131/11/2013199.99

    2/11/20137/11/2013149.99

    8/11/201324/11/2013199.99

    25/11/201326/11/2013149.99

    27/11/201329/11/2013140

    30/11/20135/12/2013149.99

    6/12/20136/12/2013140

    7/12/201325/12/2013149.99

    26/12/201313/01/2014199

    14/01/201414/01/2014119.99

    15/01/201421/01/2014199.99

    I am struggling to come up with a solution after trying for a number of days.

    Any suggestions welcomed.

    Thanks

  • To close to bedtime to code and test, but you should just be able to use the LAG and LEAD operations in a cte to look at the previous [price] and the next [date from] values. Use OVER(ORDER BY [Date From])

    Filter out rows where price = lagprice. They don't change the price so they are irrelevant.

    [Date To] is recalculated by subtracting one day from the LEAD(Date From]) That way there are no gaps in the date ranges.

    Hope this helps.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • will you ever have the case where there are "gaps" in your date ranges?

    ie...there are dates where you have no price

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • One way is to expand the dates with a number/tally table and then roll them up again:

    -- *** Consumable Test Data ***

    -- Please supply in future with dates in ISO format.

    CREATE TABLE #t

    (

    DateFrom date NOT NULL

    ,DateTo date NOT NULL

    ,Price money NOT NULL

    );

    INSERT INTO #t

    VALUES ('20130620', '20130826', 199.99)

    ,('20130827', '20140121', 199.99)

    ,('20131003', '20131031', 149.99)

    ,('20131102', '20131107', 149.99)

    ,('20131125', '20131126', 149.99)

    ,('20131127', '20131129', 140.00)

    ,('20131130', '20131205', 149.99)

    ,('20131206', '20131206', 140.00)

    ,('20131207', '20131225', 149.99)

    ,('20140114', '20140114', 119.99);

    -- *** End Test Data ***

    WITH Numbers(n)

    AS

    (

    -- 10000 rows

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)

    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)

    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)

    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)

    )

    ,MinPrices

    AS

    (

    SELECT A.PriceDate, MIN(Price) AS Price

    FROM #t T

    JOIN Numbers N

    ON N.N <= DATEDIFF(day, T.DateFrom, T.DateTo) + 1

    CROSS APPLY (VALUES( DATEADD(day, N -1, T.DateFrom)) ) A(PriceDate)

    GROUP BY A.PriceDate

    )

    ,Grps

    AS

    (

    SELECT PriceDate, Price

    ,ROW_NUMBER() OVER (ORDER BY PriceDate)

    - ROW_NUMBER() OVER (PARTITION BY Price ORDER BY PriceDate) AS Grp

    FROM MinPrices

    )

    ,Results

    AS

    (

    SELECT MIN(PriceDate) AS DateFrom

    ,MAX(PriceDate) AS DateTo

    ,Price

    ,Grp

    FROM Grps

    GROUP BY Price, Grp

    )

    SELECT DateFrom, DateTo, Price

    FROM Results

    ORDER BY DateFrom;

  • Hi There

    No there won't be any gaps, they must be continuous even if they overlap.

    Thanks

  • This works great, thank you so much

    Ken McKelvey (9/27/2016)


    One way is to expand the dates with a number/tally table and then roll them up again:

    -- *** Consumable Test Data ***

    -- Please supply in future with dates in ISO format.

    CREATE TABLE #t

    (

    DateFrom date NOT NULL

    ,DateTo date NOT NULL

    ,Price money NOT NULL

    );

    INSERT INTO #t

    VALUES ('20130620', '20130826', 199.99)

    ,('20130827', '20140121', 199.99)

    ,('20131003', '20131031', 149.99)

    ,('20131102', '20131107', 149.99)

    ,('20131125', '20131126', 149.99)

    ,('20131127', '20131129', 140.00)

    ,('20131130', '20131205', 149.99)

    ,('20131206', '20131206', 140.00)

    ,('20131207', '20131225', 149.99)

    ,('20140114', '20140114', 119.99);

    -- *** End Test Data ***

    WITH Numbers(n)

    AS

    (

    -- 10000 rows

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)

    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)

    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)

    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)

    )

    ,MinPrices

    AS

    (

    SELECT A.PriceDate, MIN(Price) AS Price

    FROM #t T

    JOIN Numbers N

    ON N.N <= DATEDIFF(day, T.DateFrom, T.DateTo) + 1

    CROSS APPLY (VALUES( DATEADD(day, N -1, T.DateFrom)) ) A(PriceDate)

    GROUP BY A.PriceDate

    )

    ,Grps

    AS

    (

    SELECT PriceDate, Price

    ,ROW_NUMBER() OVER (ORDER BY PriceDate)

    - ROW_NUMBER() OVER (PARTITION BY Price ORDER BY PriceDate) AS Grp

    FROM MinPrices

    )

    ,Results

    AS

    (

    SELECT MIN(PriceDate) AS DateFrom

    ,MAX(PriceDate) AS DateTo

    ,Price

    ,Grp

    FROM Grps

    GROUP BY Price, Grp

    )

    SELECT DateFrom, DateTo, Price

    FROM Results

    ORDER BY DateFrom;

  • Ken McKelvey (9/27/2016)


    One way is to expand the dates with a number/tally table and then roll them up again:

    The problem with this approach is that it can be very expensive, because you're creating a lot of unnecessary records. The following gives the same results, but is much more efficient.

    ;

    WITH packed_dates AS (

    SELECT p.price, d.dt, d.is_start,

    CASE

    WHEN LEAD(d.dt, 1, '9999') OVER(PARTITION BY p.price ORDER BY d.dt, d.is_start) = d.dt THEN 1

    WHEN LAG(d.dt, 1, '1900') OVER(PARTITION BY p.price ORDER BY d.dt, d.is_start) = d.dt THEN 1

    END AS exclusion_flag

    FROM #prices p

    CROSS APPLY (VALUES(p.date_from, 1), (DATEADD(DAY, 1, p.date_to), 0)) d(dt, is_start)

    )

    , date_ranges AS (

    SELECT pd.dt AS date_from, LEAD(pd.dt) OVER(ORDER BY pd.dt) AS date_to

    FROM packed_dates pd

    WHERE pd.exclusion_flag IS NULL

    )

    SELECT dr.date_from, DATEADD(DAY, -1, dr.date_to) AS date_to, MIN(p.price) AS price

    FROM date_ranges dr

    INNER JOIN #prices p

    ON p.date_from < dr.date_to

    AND p.date_to >= dr.date_from

    WHERE dr.date_from < dr.date_to

    GROUP BY dr.date_from, dr.date_to

    ;

    Here is a comparison of times:

    ----- Expand the dates -----

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#t__________________________________________________________________________________________________________________0000000E5AA8'. Scan count 1, logical reads 10000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 47 ms, elapsed time = 53 ms.

    ----- "Unpivot"/Lead/Lag -----

    Table '#prices_____________________________________________________________________________________________________________0000000E56A2'. Scan count 2, logical reads 14, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    As you can see, the "expand the dates" approach has only one scan of the table, but 10000 logical reads, whereas mine has two scans, but only 14 logical reads. This manifests in the large difference in the CPU and elapsed times. And that's on a very small subset of data.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • No there won't be any gaps, they must be continuous even if they overlap.

    Instead of trying to repair the damage after it has been done, you can add constraints and prevent it from happening in the first place. Read this article:

    https://www.simple-talk.com/sql/t-sql-programming/contiguous-time-periods/

    After this, download a PDF of the Rick Snodgrass book on temporal queries in SQL. It is free from the University of Arizona.

    Books in Celko Series for Morgan-Kaufmann Publishing
    Analytics and OLAP in SQL
    Data and Databases: Concepts in Practice
    Data, Measurements and Standards in SQL
    SQL for Smarties
    SQL Programming Style
    SQL Puzzles and Answers
    Thinking in Sets
    Trees and Hierarchies in SQL

  • Thank you so much for code, it is a lot faster but it has some issues that I am struggling to figure out.

    With the following data the faster version gets it wrong, but the slower version gets it right, any idea why or how to fix as I have come to rely on the performance:

    DateFromDateToPrice

    2016-06-272016-11-2849.99

    2016-06-279999-12-3049.99

    2016-11-299999-12-3039.99

    The valid From dates are the same and for some reason both of the 49.99 prices are having the exclusion flag set.

    I would appreciate any help

    Thanks

  • wjh_uk (11/28/2016)


    Thank you so much for code, it is a lot faster but it has some issues that I am struggling to figure out.

    With the following data the faster version gets it wrong, but the slower version gets it right, any idea why or how to fix as I have come to rely on the performance:

    DateFromDateToPrice

    2016-06-272016-11-2849.99

    2016-06-279999-12-3049.99

    2016-11-299999-12-3039.99

    The valid From dates are the same and for some reason both of the 49.99 prices are having the exclusion flag set.

    I would appreciate any help

    Thanks

    The algorithm assumes that the combination of Price, dt, and is_start is unique. You need to make sure that this is the case before applying the exclusions. Given that, see if you can modify the query to produce unique combinations of those fields.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • This isn't going to be as fast as Drew's code, but it should avoid the problems, and be slightly faster than Ken's once you get some volume of data considerably larger than the test data provided:

    CREATE TABLE #PRICE_VALID_DATES

    (

    DateFrom date NOT NULL

    ,DateTo date NOT NULL

    ,Price money NOT NULL

    );

    CREATE NONCLUSTERED INDEX IX_PRICE_VALID_DATES_DateFrom_INCLUDES_REMAINDER ON #PRICE_VALID_DATES

    (

    DateFrom ASC

    )

    INCLUDE

    (

    DateTo, Price

    );

    CREATE NONCLUSTERED INDEX IX_PRICE_VALID_DATES_DateTo_INCLUDES_REMAINDER ON #PRICE_VALID_DATES

    (

    DateTo ASC

    )

    INCLUDE

    (

    DateFrom, Price

    );

    CREATE NONCLUSTERED INDEX IX_PRICE_VALID_DATES_Price_INCLUDES_REMAINDER ON #PRICE_VALID_DATES

    (

    Price ASC

    )

    INCLUDE

    (

    DateFrom, DateTo

    );

    SET NOCOUNT ON;

    INSERT INTO #PRICE_VALID_DATES

    VALUES ('20130620', '20130826', 199.99)

    ,('20130827', '20140121', 199.99)

    ,('20131003', '20131031', 149.99)

    ,('20131102', '20131107', 149.99)

    ,('20131125', '20131126', 149.99)

    ,('20131127', '20131129', 140.00)

    ,('20131130', '20131205', 149.99)

    ,('20131206', '20131206', 140.00)

    ,('20131207', '20131225', 149.99)

    ,('20140114', '20140114', 119.99);

    DECLARE @MIN_DATE AS date = (SELECT MIN(DateFrom) FROM #PRICE_VALID_DATES;

    DECLARE @MAX_DATE AS date = (SELECT MAX(DateTo) FROM #PRICE_VALID_DATES;

    DECLARE @NUM_DAYS AS bigint = DATEDIFF(day, @MIN_DATE, @MAX_DATE) + 1;

    WITH E1 AS (

    SELECT 1 AS N UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ),

    ALL_DATES AS (

    -- ONLY GET AS MANY DATES AS YOU NEED TO COVER THE ENTIRE POSSIBLE RANGE OF VALID DATES

    SELECT TOP (@NUM_DAYS) DATEADD(day, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1, @MIN_DATE) AS DAY_DATE

    FROM E1 AS a, E1 AS b, E1 AS c, E1 AS d

    ),

    DAILY_PRICES AS (

    SELECT D.DAY_DATE, MIN(P.Price) AS Price,

    CASE

    --DETECT WHEN THE PRICE CHANGES

    WHEN LAG(MIN(P.Price), 1, -0.01) OVER(ORDER BY DAY_DATE) <> MIN(P.Price) THEN 1

    ELSE 0

    END AS IS_CHANGED

    FROM #PRICE_VALID_DATES AS P

    INNER JOIN ALL_DATES AS D-- THE JOIN CONDITION APPLIES THE PRICE TO ALL THE DATES IN THE GIVEN RANGE

    ON D.DAY_DATE BETWEEN P.DateFrom AND P.DateTo

    GROUP BY D.DAY_DATE

    )

    SELECT PCD.DAY_DATE AS DATE_FROM,

    --AS RANGES ARE GUARANTEED TO BE CONTINUOUS, YOU JUST NEED THE DAY BEFORE THE NEXT PRICE CHANGE, OR THE MAX DATE VALUE, AS THE END OF A GIVEN RANGE

    ISNULL(DATEADD(day, -1, LEAD(PCD.DAY_DATE, 1) OVER(ORDER BY PCD.DAY_DATE)), @MAX_DATE) AS DATE_TO,

    PCD.Price

    FROM DAILY_PRICES AS PCD

    WHERE IS_CHANGED = 1

    OR PCD.DAY_DATE = @MIN_DATE

    ORDER BY DAY_DATE;

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Here is the updated code:

    ;

    WITH unique_dates AS (

    SELECT DISTINCT p.Price, d.dt, d.is_start

    FROM #prices p

    CROSS APPLY (VALUES(p.date_from, 1), (DATEADD(DAY, 1, p.date_to), 0)) d(dt, is_start)

    )

    , packed_dates AS (

    SELECT ud.price, ud.dt, ud.is_start,

    CASE

    WHEN LEAD(ud.dt, 1, '9999') OVER(PARTITION BY ud.price ORDER BY ud.dt, ud.is_start) = ud.dt THEN 1

    WHEN LAG(ud.dt, 1, '1900') OVER(PARTITION BY ud.price ORDER BY ud.dt, ud.is_start) = ud.dt THEN 1

    END AS exclusion_flag

    FROM unique_dates ud

    )

    , date_ranges AS (

    SELECT pd.dt AS date_from, LEAD(pd.dt) OVER(ORDER BY pd.dt) AS date_to

    FROM packed_dates pd

    WHERE pd.exclusion_flag IS NULL

    )

    SELECT dr.date_from, DATEADD(DAY, -1, dr.date_to) AS date_to, MIN(p.price) AS price

    FROM date_ranges dr

    INNER JOIN #prices p

    ON p.date_from < dr.date_to

    AND p.date_to >= dr.date_from

    WHERE dr.date_from < dr.date_to

    GROUP BY dr.date_from, dr.date_to

    ;

    I've split the first CTE into two parts. The first one finds the unique values that I mentioned, the second one sets the exclusion flags.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply