Looking for best way to summarize some data

  • Hi all,

    Hope i'm posting this in the right place, if not, please let me know.

    I have a table like this:

    DECLARE @tbl TABLE (recdate DATE, myflag BIT)

    That table has rows for all dates in a range, the myflag bit will change off and on, something like this:

    2017-01-01 | 1
    2017-01-02 | 1
    2017-01-03 | 1
    ...
    2017-04-03 | 1
    2017-04-04 | 0
    2017-04-05 | 0
    ..
    2017-05-15 | 0
    2017-05-16 | 1
    etc.

    but what I really need to get to is something like

    period_from | period_to | myflag
    2017-01-01 | 2017-04-03 | 1
    2017-04-04 | 2017-05-15 | 0
    2017-05-16 | 2017-05-21 | 1

    so every time myflag changes, it creates a new row and the previous row has the end date set (if that makes sense)

    I'm sure there is an incredibly obvious way of doing this, but I'm about ready to bash my head against the wall.. i've gone back and forth with selects and subselects and inserts and updates into temporary tables, even trying a cursor (I know!  but its a one-off query)

    I'd really appreciate it if someone could point me in the right direction before I tear my hair out.

    Thanks

    --------------------------

    I long for a job where my databases dont have any pesky users accessing them 🙂

  • I received a reply from another source for this problem, so thought i'd add the solution here just in case it helps anyone else in the future

    select
       min(recdate) as period_from, max(recdate) as period_to, flag
    from (
        select
        t.*,
        row_number() over (order by recdate) as seqnum,    
        row_number() over (partition by flag order by recdate) as seqnum_f  
        from @tbl t 
    ) t
    group by (seqnum - seqnum_f), flag;

    --------------------------

    I long for a job where my databases dont have any pesky users accessing them 🙂

  • torpkev - Tuesday, June 5, 2018 3:22 PM

    I received a reply from another source for this problem, so thought i'd add the solution here just in case it helps anyone else in the future

    select
       min(recdate) as period_from, max(recdate) as period_to, flag
    from (
        select
        t.*,
        row_number() over (order by recdate) as seqnum,    
        row_number() over (partition by flag order by recdate) as seqnum_f  
        from @tbl t 
    ) t
    group by (seqnum - seqnum_f), flag;

    Great job finding the solution and then posting back for the benefit of others similarly situated.
    One question:  Do you understand how it works?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • The honest answer to that would be "I think so.." - but I'd appreciate a confirmation if I'm on the right track.. if not, then if you could point out where I'm off, that'd be awesome.

    The sub-select assigns some row numbers, the first is just ordered by date, so a simple row number by date..
    the second is partitioned by my flag and then ordered by date - so:


    myflag | seqnum | seqnum_f
    0      | 1      | 1
    0      | 2      | 2
    0      | 3      | 3

    etc. - until the flag flips and then seqnum_f starts its count over


    myflag | seqnum | seqnum_f
    0      | 1      | 1
    0      | 2      | 2
    0      | 3      | 3
    1      | 4      | 1
    1      | 5      | 2
    0      | 6      | 1
    0      | 7      | 2

    etc.

    Then the grouping is done on (seqnum - seqnum_f)


    date       | myflag | seqnum | seqnum_f | [(seqnum - seqnum_f)]
    2018-01-01 | 0      | 1      | 1        | 0
    2018-01-02 | 0      | 2      | 2        | 0
    2018-01-03 | 0      | 3      | 3        | 0
    2018-01-04 | 1      | 4      | 1        | 3
    2018-01-05 | 1      | 5      | 2        | 3
    2018-01-06 | 0      | 6      | 1        | 5
    2018-01-07 | 0      | 7      | 2        | 5

    So then we grab the min() and max() dates to get start and end within the group, so would end up with something like:


    period_from | period_to  | myflag
    2018-01-01  | 2018-01-03 | 0
    2018-01-04  | 2018-01-05 | 1
    2018-01-06  | 2018-01-07 | 0

    --------------------------

    I long for a job where my databases dont have any pesky users accessing them 🙂

  • Yep.   You've got it.   The problem you had to begin with falls into the "Gaps and Islands" category, and in your case, it was the "islands" of both "flag on" and "flag off" that needed to be handed a unique value so that GROUP BY can work.   A difference between two ROW_NUMBER() functions with slightly different parameters is usually the fastest way to derive an "island label", so to speak.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply