T-SQL : Slicing time Islands and gaps

  • Hello,

    Consider the following table that gives the validity dates of an item :

     create table #table1(
    DD_S int,
    DF_S int,
    id_prod int,
    is_valid bit
    )
    insert into #table1
    values
    (20190101,20190601,10,0),
    (20190601,20190901,10,0),
    (20190901,20200701,10,0),
    (20200701,20211001,10,0),
    (20211001,20211101,10,0),
    (20211101,20220401,10,1),
    (20220401,20291231,10,0),
    (20190101,20200201,15,1),
    (20200201,20301101,15,0)

    Now consider the following table that gives the identifier of a item throughout time:

    create table #table2(
    DD int,
    DF int,
    identifier int,
    id_prod int
    )

    insert into #table2
    VALUES
    (20190101,20211001,5,10),
    (20211001,20211101,4,10),
    (20211101,20291231,5,10),
    (20190101,20190501,6,15),
    (20190501,20400101,7,15)

    The goal is to 'slice' the first table to associate to each Item , while taking into account the time periods. The result should look like this:

     DD_S          DF_S      id_Prod   identifier        DD           DF        IS_Valid
    20190101 20190601 10 5 20190101 20211001 0
    20190601 20190901 10 5 20190101 20211001 0
    20190901 20200701 10 5 20190101 20211001 0
    20200701 20211001 10 5 20190101 20211001 0
    20211001 20211101 10 4 20211001 20211101 0
    20211101 20220401 10 5 20211101 20291231 1
    20220401 20291231 10 5 20211101 20291231 0
    20190101 20190501 15 6 20190101 20190501 1
    20190501 20200201 15 6 20190501 20400101 1
    20200201 20301101 15 7 20190501 20400101 0

     

    The idea is Slicing time periods by related time

    Any idea or best solution to do that ?

    Thanks

    • This topic was modified 1 year, 10 months ago by  MrRobot.
    • This topic was modified 1 year, 10 months ago by  MrRobot.
  • Step1: convert the dates from INT to DATE

    In  #table1 the date ranges overlap.  How to get rid of the overlaps?  My code updates t1 (my guess is the ranges are inclusive of the DF_S date)

    Next: confirm the basic triage of the rows in #table1.  Some rows in t1 are completely covered by a row t2.  Some rows are undefined by t2.  Some rows need to be split

    drop table if exists #table1;
    go
    create table #table1(
    DD_S int,
    DF_S int,
    id_prod int,
    is_valid bit)

    insert into #table1
    values
    (20190101,20190601,10,0),
    (20190601,20190901,10,0),
    (20190901,20200701,10,0),
    (20200701,20211001,10,0),
    (20211001,20211101,10,0),
    (20211101,20220401,10,1),
    (20220401,20291231,10,0),
    (20190101,20200201,15,1),
    (20200201,20301101,15,0);

    drop table if exists #table2;
    go
    create table #table2(
    DD int,
    DF int,
    identifier int,
    id_prod int)

    insert into #table2
    VALUES
    (20190101,20211001,5,10),
    (20211001,20211101,4,10),
    (20190101,20190501,6,15),
    (20190501,20400101,7,15);

    /* use sql date type */
    drop table if exists #table1_dt;
    go
    create table #table1_dt(
    DD_S date,
    DF_S date,
    id_prod int,
    is_valid bit)

    drop table if exists #table2_dt;
    go
    create table #table2_dt(
    DD date,
    DF date,
    identifier int,
    id_prod int)

    insert #table1_dt
    select datefromparts(DD_S/10000, DD_S/100%100, DD_S%100),
    datefromparts(DF_S/10000, DF_S/100%100, DF_S%100),
    id_prod, is_valid
    from #table1;

    insert #table2_dt
    select datefromparts(DD/10000, DD/100%100, DD%100),
    datefromparts(DF/10000, DF/100%100, DF%100),
    identifier, id_prod
    from #table2;


    /* update t1 to eliminate overlaps per id_prod */
    update t
    set DD_S=dateadd(month, 1, t.DD_S)
    from #table1_dt t
    join #table1_dt tt on t.id_prod=tt.id_prod
    and t.DD_S=tt.DF_S

    /* triage rows in t1 */
    with
    covered_cte as (
    select o.*, t.identifier
    from #table1_dt o
    join #table2_dt t on o.id_prod=t.id_prod
    and o.DD_S between t.DD and t.DF
    and o.DF_S between t.DD and t.DF),
    undef_cte as (
    select o.*, 0 identifier
    from #table1_dt o
    where not exists (select 1
    from covered_cte sc
    where sc.id_prod=o.id_prod
    and sc.DD_S=o.DD_S
    and sc.DF_S=o.DF_S)
    and not exists(select 1
    from #table2_dt t
    where t.id_prod=o.id_prod
    and (t.DD between o.DD_S and o.DF_S
    or t.DF between o.DD_S and o.DF_S))),
    to_be_split_cte as (
    select DD_S, DF_S, id_prod, is_valid
    from #table1_dt
    except
    select DD_S, DF_S, id_prod, is_valid
    from undef_cte
    except
    select DD_S, DF_S, id_prod, is_valid
    from covered_cte)
    select 'covered', * from covered_cte
    union all
    select 'undef', * from undef_cte
    union all
    select 'to_be_split', *, 0 from to_be_split_cte
    order by 4, 2;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Hi @steve-2 Collins

    Thanks for the help.

    Your query doesn't resolve my problem i need to cover the date of overlapping for example the id_pord=15 as bellow that i want

     20190101    20190501    15          6             20190101    20190501       1
    20190501 20200201 15 7 20190501 20400101 1
    20200201 20301101 15 7 20190501 20400101 0

    This a query that it give me the right result but i want something dynamically i have a lot of identifier ..

     with t2 as
    (
    select DD, DF, identifier, id_prod
    from #table2
    union all
    select max(DF), 99991231, 5, id_prod
    from #table2
    group by id_prod
    ),
    C as
    (
    select t1.*, t2.identifier,
    iif(t1.DD_S >= t2.DD, t1.DD_S, t2.DD) as DD,
    iif(t1.DF_S <= t2.DF, t1.DF_S, t2.DF) as DF
    from #table1 t1
    inner join t2 on t2.id_prod = t1.id_prod
    )
    select DD_S, DF_S, id_prod, identifier, DD, DF, is_valid
    from C
    where DD < DF
    order by id_prod, DD

    Thanks

    • This reply was modified 1 year, 10 months ago by  MrRobot.
  • You're saying "split my data" but you've provided 9 rows of input data of which 8 rows do not (seemingly) require splitting.  Does my query correctly separate rows which DO NOT need to be split from rows which DO need to be split?  Why in the CTE of the query you just posted would it apply identifier=5 to id_prod=15 when in the sample data provided initially it's associated with id_prod=10?

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Sorry i forgot a line in sample data (just updated)

    the query that i shared doesn't work perfectly but just i want to giving an idea i am trying to solve that problem .

    if you see the  output that i want to problem is caused with id_prod , i want to cover the gap between table 1 and table 2

    DD_S DF_S id_Prod identifier DD DF IS_Valid

    20190101 20190501 15 6 20190101 20190501 1

    20190501 20200201 15 7 20190501 20400101 1

    20200201 20301101 15 7 20190501 20400101 0

    Thanks

    • This reply was modified 1 year, 10 months ago by  MrRobot.
    • This reply was modified 1 year, 10 months ago by  MrRobot.
  • One of the fastest methods ever created for doing this type of thing is documented in the following article.

    https://blogs.solidq.com/en/sqlserver/packing-intervals/

    It's an older article so solution #3 can be made work in SS 2017 but have never tested that solution.  I HAVE tested solution #2 in the past and it's nasty fast.  It's also comparatively simple.  I just don't have the time today to write code to prove it. 😀

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi,

    I did the query as bellow i don't know if will perfectly work for another data but with my sample data is work fine just one problem when i do the union the first query doesn't work i don't know why but when i execute the query one by one i have the result.

    Can you please tell me why the union doesn't work and if you have any elegant solution it will be great

    ;with cte as(
    SELECT a.[DD_S]
    ,a.[DF_S]
    ,a.[id_Prod]
    ,b.[identifier]
    ,b.[DD]
    , b.[DF]
    ,LAG([DD_S]) over (partition by a.id_prod order by DD_S) as LagDD_S
    ,LAG([DF_S]) over (partition by a.id_prod order by DD_S) as LagDF_S
    ,LEAD([DD_S]) over (partition by a.id_prod order by DD_S) as LeadDD_S
    ,LEAD([DD]) over (partition by a.id_prod order by DD_S) as LeadDD

    FROM table1 a
    join table2 b
    on a.[id_Prod]=b.[id_Prod]
    )

    select DF as DD_S,LagDD_S as DF_S,id_prod,identifier from cte
    where
    1=1
    and DF<LagDF_S
    and LeadDD_S is null /*Doesn't work but when i execute the query alone i have a result*/
    union
    select DD_S,DF_S,id_prod,identifier from cte
    where ([DD_S]>=[DD] and [DF_S]<=[DF])
    union
    select DD_S,LeadDD as DF_S,id_prod,identifier from cte
    where LagDD_S is null
    and ([DD_S]<=[DD] and [DF_S]>=[DF])
    order by id_prod,dd_s

    Thanks

    • This reply was modified 1 year, 10 months ago by  MrRobot.
  • MrRobot wrote:

    ... and if you have any elegant solution it will be great

    I posted a link in my previous response as to what I think "the" elegant solution would be.  You just have to adapt it to your needs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • MrRobot wrote:

    Hi,

    I did the query as bellow i don't know if will perfectly work for another data but with my sample data is work fine just one problem when i do the union the first query doesn't work i don't know why but when i execute the query one by one i have the result.

    Can you please tell me why the union doesn't work and if you have any elegant solution it will be great

    Thanks

    The updated code still contains the INTEGER dates.  Imo it's mandatory to use SQL dates.  Consider the UPDATE statements to make the date ranges unique (DD to DF - wise).  Just to add 1 month to an INTEGER date would be a gigantic waste of needless complexity.  Imo you should switch ALL dates stored as INTEGER to SQL dates everywhere in the database and never go backwards again

    The approach of getting it to work with 1 single case and then expanding the number of inputs seems likely to be frustrating and unlikely to work out well.  It's better imo to use fully representative input from the outset.  It's a challenging query or so it seems.  The article Jeff listed is definitely applicable.  On top of the packing issue it appears the 'identifier' value gets "carried through" from the adjacent #table2 range(s)

    When I re-run my code with the additional t2 input row it now seems to correctly classify 8/9ths of the #table1 rows.  The remaining row needs splitting

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • To make more realistic (I guess) I added an additional row to t2 so the split requires 3 rows instead of 2.   Also, I added a UNIQUE CLUSTERED INDEX on the DD/DF columns in both t1 and t2 tables.

    drop table if exists #table1;
    go
    create table #table1(
    DD_S int,
    DF_S int,
    id_prod int,
    is_valid bit)

    insert into #table1
    values
    (20190101,20190601,10,0),
    (20190601,20190901,10,0),
    (20190901,20200701,10,0),
    (20200701,20211001,10,0),
    (20211001,20211101,10,0),
    (20211101,20220401,10,1),
    (20220401,20291231,10,0),
    (20190101,20200201,15,1),
    (20200201,20301101,15,0);

    drop table if exists #table2;
    go
    create table #table2(
    DD int,
    DF int,
    identifier int,
    id_prod int)

    insert into #table2
    VALUES
    (20190101,20211001,5,10),
    (20211001,20211101,4,10),
    (20211101,20291231,5,10),
    (20190101,20190501,6,15),
    (20190501,20190601,7,15), /* added row */ (20190601,20400101,8,15)
    --(20190101,20190501,6,15),
    --(20190501,20400101,7,15);

    /* use sql date type */drop table if exists #table1_dt;
    go
    create table #table1_dt(
    DD_S date,
    DF_S date,
    id_prod int,
    is_valid bit)

    drop table if exists #table2_dt;
    go
    create table #table2_dt(
    DD date,
    DF date,
    identifier int,
    id_prod int)

    insert #table1_dt
    select datefromparts(DD_S/10000, DD_S/100%100, DD_S%100),
    datefromparts(DF_S/10000, DF_S/100%100, DF_S%100),
    id_prod, is_valid
    from #table1;

    insert #table2_dt
    select datefromparts(DD/10000, DD/100%100, DD%100),
    datefromparts(DF/10000, DF/100%100, DF%100),
    identifier, id_prod
    from #table2;


    /* update t1 to eliminate overlaps per id_prod */update t
    set DD_S=dateadd(month, 1, t.DD_S)
    from #table1_dt t
    join #table1_dt tt on t.id_prod=tt.id_prod
    and t.DD_S=tt.DF_S;
    create unique clustered index ndx_dd_df on #table1_dt(DD_S, DF_S);

    /* update t2 to eliminate overlaps per id_prod */update t
    set DD=dateadd(month, 1, t.DD)
    from #table2_dt t
    join #table2_dt tt on t.id_prod=tt.id_prod
    and t.DD=tt.DF;
    create unique clustered index ndx_dd_df on #table2_dt(DD, DF);

    This query expands the non-covered DD_S to DF_S range from t1 and OUTER APPLY's a correlated subquery which expands the DD to DS range from t2.

    with 
    covered_cte as (
    select o.*, t.identifier
    from #table1_dt o
    join #table2_dt t on o.id_prod=t.id_prod
    and o.DD_S between t.DD and t.DF
    and o.DF_S between t.DD and t.DF),
    undef_cte as (
    select o.*, 0 identifier
    from #table1_dt o
    where not exists (select 1
    from covered_cte sc
    where sc.id_prod=o.id_prod
    and sc.DD_S=o.DD_S
    and sc.DF_S=o.DF_S)
    and not exists(select 1
    from #table2_dt t
    where t.id_prod=o.id_prod
    and (t.DD between o.DD_S and o.DF_S
    or t.DF between o.DD_S and o.DF_S))),
    to_be_split_cte as (
    select DD_S, DF_S, id_prod, is_valid
    from #table1_dt
    except
    select DD_S, DF_S, id_prod, is_valid
    from undef_cte
    except
    select DD_S, DF_S, id_prod, is_valid
    from covered_cte)
    select 'to_be_split', s.*, 0, calc.dt,
    min_range.start_dd min_start_dd, min_range.end_df min_end_df,
    max_range.start_dd max_start_dd, max_range.end_df max_end_df,
    ott.identifier ott_id
    from to_be_split_cte s
    /* locate t2 range starting point */ outer apply (select top(1) tt.DD, tt.DF
    from #table2_dt tt
    where tt.id_prod=s.id_prod
    and tt.DF>=s.DD_S
    order by tt.DF) min_range(start_dd, end_df)
    /* locate t2 range ending point */ outer apply (select top(1) tt.DD, tt.DF
    from #table2_dt tt
    where tt.id_prod=s.id_prod
    and tt.DD<=s.DF_S
    order by tt.DF desc) max_range(start_dd, end_df)
    /* expand t1 date range */ cross apply dbo.fnTally(0, datediff(month, s.DD_S, s.DF_S)) fn
    cross apply (values (dateadd(month, fn.n, s.DD_S))) calc(dt)
    /* outer join/apply expanded t2 range(s) */
    outer apply (select tt.*, tt_calc.dt
    from #table2_dt tt
    /* expand t2 date range(s) */
    cross apply dbo.fnTally(0, datediff(month, tt.DD, tt.DF)) fn
    cross apply (values (dateadd(month, fn.n, tt.DD))) tt_calc(dt)
    where tt.id_prod=s.id_prod
    and tt_calc.dt=calc.dt) ott;

    This query uses GROUP BY to summarize the results from above.  UNION ALL with the "covered" rows

    with 
    covered_cte as (
    select o.*, t.identifier
    from #table1_dt o
    join #table2_dt t on o.id_prod=t.id_prod
    and o.DD_S between t.DD and t.DF
    and o.DF_S between t.DD and t.DF),
    undef_cte as (
    select o.*, 0 identifier
    from #table1_dt o
    where not exists (select 1
    from covered_cte sc
    where sc.id_prod=o.id_prod
    and sc.DD_S=o.DD_S
    and sc.DF_S=o.DF_S)
    and not exists(select 1
    from #table2_dt t
    where t.id_prod=o.id_prod
    and (t.DD between o.DD_S and o.DF_S
    or t.DF between o.DD_S and o.DF_S))),
    to_be_split_cte as (
    select DD_S, DF_S, id_prod, is_valid
    from #table1_dt
    except
    select DD_S, DF_S, id_prod, is_valid
    from undef_cte
    except
    select DD_S, DF_S, id_prod, is_valid
    from covered_cte)
    select min(calc.dt) DD_S,
    max(calc.dt) DF_F,
    s.id_prod, s.is_valid, ott.identifier
    from to_be_split_cte s
    /* locate t2 range starting point */ outer apply (select top(1) tt.DD, tt.DF
    from #table2_dt tt
    where tt.id_prod=s.id_prod
    and tt.DF>=s.DD_S
    order by tt.DF) min_range(start_dd, end_df)
    /* locate t2 range ending point */ outer apply (select top(1) tt.DD, tt.DF
    from #table2_dt tt
    where tt.id_prod=s.id_prod
    and tt.DD<=s.DF_S
    order by tt.DF desc) max_range(start_dd, end_df)
    /* expand t1 date range */ cross apply dbo.fnTally(0, datediff(month, s.DD_S, s.DF_S)) fn
    cross apply (values (dateadd(month, fn.n, s.DD_S))) calc(dt)
    /* outer join/apply expanded t2 range(s) */
    outer apply (select tt.*, tt_calc.dt
    from #table2_dt tt
    /* expand t2 date range(s) */
    cross apply dbo.fnTally(0, datediff(month, tt.DD, tt.DF)) fn
    cross apply (values (dateadd(month, fn.n, tt.DD))) tt_calc(dt)
    where tt.id_prod=s.id_prod
    and tt_calc.dt=calc.dt) ott
    group by s.id_prod, s.is_valid, ott.identifier
    union all
    select * from covered_cte
    order by id_prod, DD_S;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • The function dbo.fnTally can be found and explained here

    CREATE FUNCTION [dbo].[fnTally]
    /**********************************************************************************************************************
    Jeff Moden Script on SSC: https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally
    **********************************************************************************************************************/
    (@ZeroOrOne BIT, @MaxN BIGINT)
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN WITH
    H2(N) AS ( SELECT 1
    FROM (VALUES
    (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    )V(N)) --16^2 or 256 rows
    , H4(N) AS (SELECT 1 FROM H2 a, H2 b) --16^4 or 65,536 rows
    , H8(N) AS (SELECT 1 FROM H4 a, H4 b) --16^8 or 4,294,967,296 rows
    SELECT N = 0 WHERE @ZeroOrOne = 0 UNION ALL
    SELECT TOP(@MaxN)
    N = ROW_NUMBER() OVER (ORDER BY N)
    FROM H8
    ;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Hi,

    Just to answer to your question about why i have int date because it's a fact table ,

    Thanks for new code. I will test that in my sample data then in my fact table, hope that query doesn't take me a lot of time

    Well,i let you know .

    Thanks

     

    • This reply was modified 1 year, 10 months ago by  MrRobot.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply