Driving me nuts: selecting valid timelines from several lines in a table

  • Hi all,
    I've been looking like crazy for answers, and although sometimes I think I might have found something, it just isn't what I am looking for.
    This one being the closest, but can't get my head around on how to implement it on my exact problem: https://technology.amis.nl/2012/12/23/advanced-sql-to-find-valid-periods-juggling-with-outer-joins-running-totals-and-analytical-functions/

    The problem at hand:
    I have a table with various records containing BEGIN and END Dates (DD-MM-YYYY), in my example for 1 single client:

    CLIENT      ID         BEGIN            END
    -----------------------------------------------------------------
    A               104      16-03-2007     15-03-2009
    A               105      24-11-2008     30-11-2010
    A               106      02-12-2008     31-03-2009
    A               107      01-04-2009     31-03-2010
    A               108      01-04-2009     31-03-2010
    A               109      02-12-2008     31-12-2010
    A               110      27-12-2010     26-12-2011
    A               111      27-12-2010     26-12-2011
    A               112      22-06-2011     21-06-2026

    Business Rule: if a higher ID has a timeline that overlaps a previous timeline, then the higher ID has priority, the lower ID should be removed from the resultset.
    The timelines (enddates) have to be altered, so the timeline overall is continued. Also, negative timelines are to be removed and a timeline should stretch for at least 2 days. So timelines of 1 day should also be removed.
    Desired resultset:

    CLIENT      ID         BEGIN            END         END NEW
    --------------------------------------------------------------------------------------
    A               104      16-03-2007     15-03-2009     23-11-2008  (BEGIN DATE OF ID = 105 MINUS 1 DAY)
    A               105      24-11-2008     30-11-2010     01-12-2008
    A               109      02-12-2008     31-12-2010     26-12-2010
    A               111      27-12-2010     26-12-2011      21-06-2010
    A               112      22-06-2011     21-06-2026      21-06-2016 (NO NEXT RECORD SO ORIGINAL END DATE)

    Anyone any idea on how to solve this???? All help is much appreciated!

  • /*
        The correct way to set up sample data.
        1) Use a script like this.
        2) Create a temp table or declare a table variable.
        3) Insert your data into your table.
        4) Use a table value constructor to display your expected results.
        5) Cleanup

    */
    /* 2) Create a temp table. This tells us what your datatypes are. */
    CREATE TABLE #clients
    (
        client CHAR(1),
        client_id TINYINT,
        begin_dt DATE,
        end_dt DATE
    )

    /* 3) Insert your data into your table. */
    INSERT #clients(client, client_id, begin_dt, end_dt)
    VALUES
        ('A', 104, '2007-03-16', '2009-03-15'),
        ('A', 105, '2008-11-24', '2010-11-30'),
        ('A', 106, '2008-12-02', '2009-03-31'),
        ('A', 107, '2009-04-01', '2010-03-31'),
        ('A', 108, '2009-04-01', '2010-03-31'),
        ('A', 109, '2008-12-02', '2010-12-31'),
        ('A', 110, '2010-12-27', '2011-12-26'),
        ('A', 111, '2010-12-27', '2011-12-26'),
        ('A', 112, '2011-06-22', '2026-06-21')

    /* People can now cut and paste your script without
        having to massage it to get it into a usable state. */
    SELECT *
    FROM #clients

    /* 4) Use a table value constructor to display your expected results. */
    SELECT *
    FROM
    (
        VALUES
            ('A', 104, CAST('2007-03-16' AS DATE), CAST('2009-03-15' AS DATE), CAST('2008-11-23' AS DATE)),
            ('A', 105, '2008-11-24', '2010-11-30', '2008-12-01'),
            ('A', 109, '2008-12-02', '2010-12-31', '2010-12-26'),
            ('A', 111, '2010-12-27', '2011-12-26', '2011-06-21'),
            ('A', 112, '2011-06-22', '2026-06-21', '2016-06-21')
    ) results(client, client_id, begin_dt, end_dt, new_end_dt)
        
    /* 5) Cleanup */
    DROP TABLE #clients

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • There were several issues that I wanted to point out here.

    • It's a BAD IDEA to have columns named just "ID".  When you get more complex queries with multiple tables that have an "ID" column, it becomes difficult to remember which "ID" is which.  I recommend always including an indication of which table the ID, so I changed the field to "client_id".
    • You should not use RESERVED WORDS as field names.  BEGIN and END are keywords, and you've used them as field names.  I've changed those to begin_dt and end_dt, which are much more descriptive as well as not being reserved words.
    • Dates should always be given in a location neutral format: either 'YYYYMMDD' or 'YYYY-MM-DD'.  Any other format will fail for a significant portion of users, because they use different localization settings from those posted.
    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Hi Drew,

    The case described has example column names. The actual column names are in Dutch and I didn't want to make it more complex then necesarry. The Date columns are actually ID's in the YYYYMMDD (as Integer) format and are related to a Date table. But again, I didn't want to make it more difficult. I wanted the focus to be on my problem: getting a correct timeline keeping the Business Rules in mind. My source is a DataVault-modelled DataWarehouse, so all columns really are correctly named, etc.

    Thank you for your scripts in the previous reply. I hope it will make it more easy for someone to find a solution. I am about to go completely bonkers. Not just for this, but for the complete set of business rules. Hopefully this will get me 1 step in the right direction.

    Cheers,
    Jeroen

  • SELECT client, client_id, c.begin_dt, end_dt,
        DATEADD(DAY, -1, LEAD(begin_dt, 1, end_dt) OVER(ORDER BY begin_dt)) AS new_end_dt
    FROM #clients c
    WHERE NOT EXISTS
    (
        SELECT *
        FROM #clients c2
        WHERE c.client = c2.client
            AND c.client_id < c2.client_id
            AND c.begin_dt >= c2.begin_dt
            AND c.end_dt <= c2.end_dt
    )
    ORDER BY c.begin_dt

    This gives the correct results.  You may need to tweak it to get it to work with dates encoded as integers.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thank you Drew, this helped a lot!!

    I am working with your solution as a basis for completing all requirements. I might be back for questions, because the final result is getting overly complicated.
    There might be better solutions....

  • oops, wrong place

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply