Find run of repeating value

  • Hi

    In the below input test data, ID is unique and in ascending order (not incremental values).

    I want to find runs of at least three 1s, and return the result below. The 0s would be any values that are not 1.

    I'm thinking there must be some 'standard' way to do this, such as treating them as islands?

    Thanks for any help.

    --INPUT TEST DATA
    declare @T table(ID int not null primary key, Val int not null)

    insert @T(ID, Val)
    select 1, 0 union all
    select 10, 1 union all
    select 17, 0 union all
    select 18, 0 union all
    select 27, 1 union all
    select 34, 1 union all
    select 42, 1 union all
    select 52, 0 union all
    select 62, 1 union all
    select 63, 1 union all
    select 65, 0 union all
    select 73, 0 union all
    select 79, 1 union all
    select 85, 1 union all
    select 89, 1 union all
    select 90, 1

    select * from @T

    --EXPECTED RESULT
    declare @R table(PosDesc varchar(10) not null, ID int null, Val int null)

    insert @R(PosDesc, ID, Val)
    select 'Before', 18, 0 union all
    select 'First', 27, 1 union all
    select 'Last',42, 1 union all
    select 'After', 52, 0 union all
    select 'Before', 73, 0 union all
    select 'First', 79, 1 union all
    select 'Last',90, 1 union all
    select 'After', NULL, NULL

    select * from @R

     

     

     

  • select 'After', NULL, NULL

    is not required in the result.

  • It's not clear what you mean. I'm not clear what the @r means here.

    First, without ordering, there's no way to tell what the runs are. You always, always need to have some ordering column.

    Next, you could like do some running count, and when this reaches 3+, you could then use that to determine you've had a group. This is an islands type problem, but I don't think it's well defined.

  • The @r is just to get into a table what the result should be so that it can be easily read. It will play no part in any solution.

    I did say it was ordered on ID, and I don't know a better way to define it other than an example like this.

  • This is kind of messy but it should work, there's one more case case that you didn't have in your sample data as well.

     

    ;WITH TEMP_CTE AS(
    select *
    , LEAD(VAL, 1, NULL) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) NEXT_VAL_ONE
    , LEAD(VAL, 2, NULL) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) NEXT_VAL_TWO
    , LAG(VAL, 1, NULL) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) PREV_VAL_ONE
    , LAG(VAL, 2, NULL) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) PREV_VAL_TWO
    from @T
    ), TEMP_CTE_TWO AS(
    SELECT *, CASE WHEN VAL = 1 AND ((NEXT_VAL_ONE = 1 AND NEXT_VAL_TWO = 1) OR (PREV_VAL_ONE = 1 AND PREV_VAL_TWO = 1) OR (PREV_VAL_ONE = 1 AND NEXT_VAL_ONE = 1)) THEN 1 ELSE 0 END AS IS_RUN FROM TEMP_CTE
    ), TEMP_CTE_THREE AS(
    SELECT *
    , LEAD(IS_RUN, 1, 0) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) NEXT_RUN
    , LAG(IS_RUN, 1, 0) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) PREV_RUN FROM TEMP_CTE_TWO
    ), TEMP_CTE_FOUR AS(
    SELECT *
    , CASE WHEN IS_RUN = 0 AND NEXT_RUN = 1 AND PREV_RUN = 1 THEN 'Before/After'
    WHEN IS_RUN = 0 AND NEXT_RUN = 1 AND PREV_RUN = 0 THEN 'Before'
    WHEN IS_RUN = 0 AND NEXT_RUN = 0 AND PREV_RUN = 1 THEN 'After'
    WHEN IS_RUN = 1 AND NEXT_RUN = 1 AND PREV_RUN = 0 THEN 'First'
    WHEN IS_RUN = 1 AND NEXT_RUN = 1 AND PREV_RUN = 1 THEN 'Middle'
    WHEN IS_RUN = 1 AND NEXT_RUN = 0 AND PREV_RUN = 1 THEN 'Last'
    ELSE 'None'
    END AS RUN_TYPE
    FROM TEMP_CTE_THREE
    )
    SELECT * FROM TEMP_CTE_FOUR
    WHERE RUN_TYPE NOT IN('None', 'Middle')
    ORDER BY ID ASC
  • ZZartin thanks, that works. What if it was 'at least 4 1s' (would need new test data). Ideally the number would be variable, could that be achieved.

  • Yes, you could try something like this,


    declare @T table(ID int not null primary key, Val int not null)
    declare @row_level int = 3

    insert @T(ID, Val)
    select 1, 0 union all
    select 10, 1 union all
    select 17, 0 union all
    select 18, 0 union all
    select 27, 1 union all
    select 34, 1 union all
    select 42, 1 union all
    select 52, 0 union all
    select 62, 1 union all
    select 63, 1 union all
    select 65, 0 union all
    select 73, 0 union all
    select 79, 1 union all
    select 85, 1 union all
    select 89, 1 union all
    select 90, 1 union all
    select 91, 0 union all
    select 92, 1 union all
    select 93, 1 union all
    select 94, 1

    ;WITH TEMP_CTE AS(
    SELECT *
    , LAG(VAL, 1, 0) OVER(ORDER BY ID ASC) AS PREV_VAL
    , LEAD(VAL, 1, 0) OVER(ORDER BY ID ASC) AS NEXT_VAL
    , LAG(ID, 1, NULL) OVER(ORDER BY ID ASC) AS PREV_ID_VAL
    , LEAD(ID, 1, NULL) OVER(ORDER BY ID ASC) AS NEXT_ID_VAL
    FROM @T
    ), TEMP_CTE_TWO AS(
    SELECT *, 1 AS ROW_LEVEL, ID AS START_ID, PREV_ID_VAL AS ORIG_PREV_ID FROM TEMP_CTE
    WHERE VAL = 1 AND PREV_VAL = 0
    UNION ALL
    SELECT TEMP_CTE.*, TEMP_CTE_TWO.ROW_LEVEL + 1 AS ROW_LEVEL, TEMP_CTE_TWO.START_ID, TEMP_CTE_TWO.ORIG_PREV_ID FROM TEMP_CTE_TWO
    INNER JOIN TEMP_CTE ON TEMP_CTE.ID = TEMP_CTE_TWO.NEXT_ID_VAL AND TEMP_CTE.VAL = 1
    ), TEMP_CTE_THREE AS(
    SELECT TEMP_CTE.ID, TEMP_CTE.VAL, 'Before' AS RUN_TYPE FROM TEMP_CTE INNER JOIN TEMP_CTE_TWO ON TEMP_CTE.ID = TEMP_CTE_TWO.ORIG_PREV_ID AND TEMP_CTE_TWO.NEXT_VAL = 0 AND TEMP_CTE_TWO.ROW_LEVEL >= @row_level
    UNION ALL
    SELECT TEMP_CTE.ID, TEMP_CTE.VAL, 'First' AS RUN_TYPE FROM TEMP_CTE INNER JOIN TEMP_CTE_TWO ON TEMP_CTE.ID = TEMP_CTE_TWO.START_ID AND TEMP_CTE_TWO.NEXT_VAL = 0 AND TEMP_CTE_TWO.ROW_LEVEL >= @row_level
    UNION ALL
    SELECT TEMP_CTE_TWO.ID, TEMP_CTE_TWO.VAL, 'Last' AS RUN_TYPE FROM TEMP_CTE_TWO WHERE TEMP_CTE_TWO.NEXT_VAL = 0 AND TEMP_CTE_TWO.ROW_LEVEL >= @row_level
    UNION ALL
    SELECT TEMP_CTE.ID, TEMP_CTE.VAL, 'After' AS RUN_TYPE FROM TEMP_CTE INNER JOIN TEMP_CTE_TWO ON TEMP_CTE.ID = TEMP_CTE_TWO.NEXT_ID_VAL AND TEMP_CTE_TWO.NEXT_VAL = 0 AND TEMP_CTE_TWO.ROW_LEVEL >= @row_level

    )
    SELECT * FROM TEMP_CTE_THREE
    ORDER BY ID
  • One way:

    WITH 
    numbered AS (SELECT *, rn=ROW_NUMBER() OVER (ORDER BY ID ASC) FROM @T),
    grouped AS (SELECT *,first_zero=MAX(CASE WHEN Val<>1 THEN rn END) OVER (ORDER BY ID ASC ROWS UNBOUNDED PRECEDING) FROM numbered),
    first_and_such AS (SELECT before_rn=MIN(CASE WHEN Val<>1 THEN rn END),first_rn=MIN(CASE WHEN Val=1 THEN rn END),last_rn=MAX(CASE WHEN Val=1 THEN rn END) FROM grouped GROUP BY first_zero),
    split AS (SELECT x.* FROM first_and_such CROSS APPLY (VALUES('Before',before_rn),('First',first_rn),('Last',last_rn),('After',last_rn+1))x(PosDesc,rn) WHERE last_rn-first_rn>1)

    SELECT s.PosDesc,n.ID,n.Val
    FROM split s LEFT JOIN numbered n ON s.rn=n.rn
    ORDER BY s.rn ASC;

     

    Cheers!

  • This will give you ID's for records with zeros wrapping the set of sequential records with 1's in between:

    SELECT Last0.ID Last0_ID, next0.ID Next0.ID
    FROM @T Last0
    OUTER APPLY (select top 1 Id from @T F0 WHERE F0.ID > Last0.ID and F0.Val = 0) next0
    WHERE last0.Val = 0
    AND exists (select * from @T T1
    where T1.ID > Last0.ID
    and (T1.ID < next0.ID or next0.ID is null)
    group by Val
    having COUNT(id) >=3
    )

    You should be able to get required outcome simply by using it as a CTE.

    Just cater for NULL in next0.ID.

    _____________
    Code for TallyGenerator

  • Yet another version.

    Non-serious note to Jeff Moden: See, I actually am willing to use an identity as the clustering key.

    DECLARE @min_vals_in_sequence int
    SET @min_vals_in_sequence = 3

    DROP TABLE IF EXISTS #T_sequential
    CREATE TABLE #T_sequential ( seq_id int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
    ID int NULL, Val int NULL );

    INSERT INTO #T_sequential ( ID, Val )
    SELECT ID, Val
    FROM @T
    ORDER BY ID
    INSERT INTO #T_sequential
    VALUES(2000000000, 0)

    SELECT
    CASE WHEN T3.seq_id = T1.seq_id THEN 'Before'
    WHEN T3.seq_id = T1.seq_id + 1 THEN 'First'
    WHEN T3.seq_id = T2.seq_id - 1 THEN 'Last'
    WHEN T3.seq_id = T2.seq_id THEN 'After'
    END AS PosDesc,
    CASE WHEN T3.ID = 2000000000 THEN NULL ELSE T3.seq_ID END AS ID,
    CASE WHEN T3.ID = 2000000000 THEN NULL ELSE T3.Val END AS Val
    FROM #T_sequential T1
    CROSS APPLY (
    SELECT TOP (1) *
    FROM #T_sequential T2
    WHERE T2.ID > T1.ID AND T2.Val = 0
    ORDER BY T2.ID
    ) AS T2
    INNER JOIN #T_sequential T3 ON T3.seq_id IN ( T1.seq_id, T1.seq_id + 1, T2.seq_id - 1, T2.seq_id )
    WHERE T1.Val = 0 AND T2.seq_id - T1.seq_id - 1 >= @min_vals_in_sequence

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    Non-serious note to Jeff Moden: See, I actually am willing to use an identity as the clustering key.

    😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I played around with this for a while last night, trying to find the most concise complete solution (found one maybe 20% shorter than my previous solution, but its performance gets rather dreadful on larger datasets)

    I also tried to improve on the performance of my previous solution, and here's another way.

    It's not nearly as concise as my previous version, but performs better under all the circumstances I tested on my system (SQL Server 2019 on a high-ish end laptop), even if we help out my previous version by specifying a ROWS frame for the windowed aggregate (which allows it to make use of an in-memory worktable instead of the disk-based worktable used by the default RANGE).

    WITH numbered     AS (SELECT *, rn  =ROW_NUMBER() OVER (ORDER BY ID ASC) FROM @T),
    lag_lead AS (SELECT *, ID1 =LAG(ID,1) OVER (ORDER BY ID ASC), ID2 =LEAD(ID,1) OVER (ORDER BY ID ASC),
    Val1=LAG(Val,1) OVER (ORDER BY ID ASC), Val2=LEAD(Val,1) OVER (ORDER BY ID ASC),
    rn1 =LAG(rn,1) OVER (ORDER BY ID ASC), rn2 =LEAD(rn,1) OVER (ORDER BY ID ASC)
    FROM numbered),
    paired AS (SELECT x.* FROM lag_lead l CROSS APPLY (VALUES(0, ID1, Val1, rn1, ID, Val, rn),
    (1, ID2, Val2, rn2, ID, Val,rn)
    )x(pos, border_ID, border_Val, border_rn, ID, Val, rn)
    WHERE l.Val=1 AND ISNULL(border_Val,0)<>1),

    combined AS (SELECT Pos,before_ID=border_ID, before_Val=border_Val, first_ID=ID, first_Val=Val,
    last_ID =LEAD(ID,1) OVER (ORDER BY ID ASC, pos ASC), last_val =LEAD(VAL,1) OVER (ORDER BY ID ASC, pos ASC),
    after_ID=LEAD(border_ID,1) OVER (ORDER BY ID ASC, pos ASC), after_Val=LEAD(border_Val,1) OVER (ORDER BY ID ASC, pos ASC),
    cnt =LEAD(rn,1) OVER (ORDER BY ID ASC, pos ASC)-rn
    FROM paired)

    SELECT x.*
    FROM combined c
    CROSS APPLY
    (VALUES('Before',before_ID,before_Val),('First',first_ID,first_Val),('Last',last_ID,last_Val),('After',after_ID,after_Val))x(PosDesc,ID,Val)
    WHERE c.pos=0 AND c.cnt>1
    ORDER BY first_ID ASC;

     

    Cheers!

  • Thanks to all for the solutions!

    On a quick look, sparked enough ideas for me to try and (hopefully) write a solution - don't want to be beaten by it. I will then come back and compare to the solutions provided here.

  • >> I want to find runs of at least three 1s, and return the result below. The 0s would be any test_values that are not 1. <<

    This is been a classic freshman computer science course problem for decades. But it's usually done with a raise in a procedural language. The reasons done in a procedural language is that the concept of an ordering that you're trying to sneak in here doesn't apply in SQL or RDBMS.

    CREATE TABLE Tests

    (test_id INTEGER NOT NULL PRIMARY KEY,

    test_val INTEGER NOT NULL

    CHECK (test_val I (0,1));

    INSERT INTO Tests

    VALUES

    (10, 1),

    (17, 0),

    (18, 0),

    (27, 1), (34, 1), (42, 1),

    (52, 0),

    (62, 1),(63, 1),

    (65, 0),

    (73, 0),

    (79, 1),(85, 1),(89, 1),(90, 1);

    I'm curious as to why you used a 50-year-old Sybase syntax to do a table constructor. That "UNION ALL" simply guaranteed your code won't port. Why did you want to create a result Table and materialize it? Wouldn't a view or query result be better? Are you still thinking about filesystems and having to physically write results to magnetic tapes?

    I also don't understand what your "pos_description" means. I would have thought that you would return an identifier of some sort for each run, and perhaps its length. Since SQL does not use ordering, words like before and after are frowned upon. Another thing we don't like in SQL is creating unnecessary NULLs. And yes, it does look like something to do with islands.

    CREATE VIEW TestsRuns

    AS

    SELECT test_id, test_val,

    (ROW_NUMBER() OVER (ORDER BY test_id)

    - ROW_NUMBER() OVER (PARTITION BY test_val

    ORDER BY test_id)) AS test_grp,

    FROM Tests;

    (10, 1, 1 - 1 = 0),

    (17, 0, 2 - 1 = 1),

    (18, 0, 3 - 2 = 1),

    (27, 1, 4 - 1 = 3),

    (34, 1, 5 - 2 = 3),

    (42, 1, 6 - 3 = 3),

    (52, 0, 7 - 1 = 6),

    (62, 1, 8 - 1 = 7),

    (63, 1, 9 - 2 = 7),

    (65, 0, 10 - 1 = 9),

    (73, 0, 11 - 2 = 9),

    (79, 1, 12 - 1 =11),

    (85,1, 13 - 2 = 11),

    (89, 1, 14 - 3 = 11),

    (90, 1 ,15 - 4 = 11);

    Given this view, you can now inspect the runs with either windowed functions or just a simple group by.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • jcelko212 32090 wrote:

    >> I want to find runs of at least three 1s, and return the result below. The 0s would be any test_values that are not 1. <<

    Since SQL does not use ordering, words like before and after are frowned upon. ...

    Given this view, you can now inspect the runs with either windowed functions or just a simple group by.

    That comment on ordering has to be unintended parody, since the code you posted uses two ORDER BYs to specify, well, the order of rows.  And the real world orders (sequences) things all the time, including noting before and after.  Such as, did you make a payment on/before it was due rather than after.

    As to the final view, please show us how to get from that to the desired result.  After already sorting twice and segmenting once the entire input data, you still haven't determined the specific rows that need included in the output.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply