Find run of repeating value

Question

Post reply

Find run of repeating value

kuopaz

Default port

Points: 1449

October 1, 2020 at 1:37 pm

#3792246

Hi

In the below input test data, ID is unique and in ascending order (not incremental values).

I want to find runs of at least three 1s, and return the result below. The 0s would be any values that are not 1.

I'm thinking there must be some 'standard' way to do this, such as treating them as islands?

Thanks for any help.

--INPUT TEST DATA
declare @T table(ID int not null primary key, Val int not null)

insert @T(ID, Val)
select 1, 0 union all
select 10, 1 union all
select 17, 0 union all
select 18, 0 union all
select 27, 1 union all
select 34, 1 union all
select 42, 1 union all
select 52, 0 union all
select 62, 1 union all
select 63, 1 union all
select 65, 0 union all
select 73, 0 union all
select 79, 1 union all
select 85, 1 union all
select 89, 1 union all
select 90, 1

select * from @T

--EXPECTED RESULT
declare @R table(PosDesc varchar(10) not null, ID int null, Val int null)

insert @R(PosDesc, ID, Val)
select 'Before', 18, 0 union all
select 'First', 27, 1  union all
select 'Last',42, 1  union all
select 'After', 52, 0  union all
select 'Before', 73, 0  union all
select 'First', 79, 1  union all
select 'Last',90, 1  union all
select 'After', NULL, NULL

select * from @R

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply

kuopaz Default port Points: 1449 More actions · Answer 1

select 'After', NULL, NULL

is not required in the result.

Steve Jones - SSC Editor SSC Guru Points: 734449 More actions · Answer 2

It's not clear what you mean. I'm not clear what the @r means here.

First, without ordering, there's no way to tell what the runs are. You always, always need to have some ordering column.

Next, you could like do some running count, and when this reaches 3+, you could then use that to determine you've had a group. This is an islands type problem, but I don't think it's well defined.

kuopaz Default port Points: 1449 More actions · Answer 3

The @r is just to get into a table what the result should be so that it can be easily read. It will play no part in any solution.

I did say it was ordered on ID, and I don't know a better way to define it other than an example like this.

ZZartin SSC-Dedicated Points: 30894 More actions · Answer 4

This is kind of messy but it should work, there's one more case case that you didn't have in your sample data as well.

;WITH TEMP_CTE AS(
select *
, LEAD(VAL, 1, NULL) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) NEXT_VAL_ONE
, LEAD(VAL, 2, NULL) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) NEXT_VAL_TWO
, LAG(VAL, 1, NULL) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) PREV_VAL_ONE
, LAG(VAL, 2, NULL) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) PREV_VAL_TWO
from @T
), TEMP_CTE_TWO AS(
SELECT *, CASE WHEN VAL = 1 AND ((NEXT_VAL_ONE = 1 AND NEXT_VAL_TWO = 1) OR (PREV_VAL_ONE = 1 AND PREV_VAL_TWO = 1) OR (PREV_VAL_ONE = 1 AND NEXT_VAL_ONE = 1)) THEN 1 ELSE 0 END AS IS_RUN FROM TEMP_CTE
), TEMP_CTE_THREE AS(
SELECT *
, LEAD(IS_RUN, 1, 0) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) NEXT_RUN
, LAG(IS_RUN, 1, 0) OVER(PARTITION BY (SELECT 1) ORDER BY ID ASC) PREV_RUN FROM TEMP_CTE_TWO 
), TEMP_CTE_FOUR AS(
SELECT *
, CASE WHEN IS_RUN = 0 AND NEXT_RUN = 1 AND PREV_RUN = 1 THEN 'Before/After'
WHEN IS_RUN = 0 AND NEXT_RUN = 1 AND PREV_RUN = 0 THEN 'Before'
WHEN IS_RUN = 0 AND NEXT_RUN = 0 AND PREV_RUN = 1 THEN 'After'
WHEN IS_RUN = 1 AND NEXT_RUN = 1 AND PREV_RUN = 0 THEN 'First'
WHEN IS_RUN = 1 AND NEXT_RUN = 1 AND PREV_RUN = 1 THEN 'Middle'
WHEN IS_RUN = 1 AND NEXT_RUN = 0 AND PREV_RUN = 1 THEN 'Last'
ELSE 'None'
END AS RUN_TYPE
FROM TEMP_CTE_THREE
)
SELECT * FROM TEMP_CTE_FOUR
WHERE RUN_TYPE NOT IN('None', 'Middle')
ORDER BY ID ASC

kuopaz Default port Points: 1449 More actions · Answer 5

ZZartin thanks, that works. What if it was 'at least 4 1s' (would need new test data). Ideally the number would be variable, could that be achieved.

ZZartin SSC-Dedicated Points: 30894 More actions · Answer 6

Yes, you could try something like this,


declare @T table(ID int not null primary key, Val int not null)
declare @row_level int = 3

insert @T(ID, Val)
select 1, 0 union all
select 10, 1 union all
select 17, 0 union all
select 18, 0 union all
select 27, 1 union all
select 34, 1 union all
select 42, 1 union all
select 52, 0 union all
select 62, 1 union all
select 63, 1 union all
select 65, 0 union all
select 73, 0 union all
select 79, 1 union all
select 85, 1 union all
select 89, 1 union all
select 90, 1 union all
select 91, 0 union all
select 92, 1 union all
select 93, 1 union all
select 94, 1

;WITH TEMP_CTE AS(
SELECT *
, LAG(VAL, 1, 0) OVER(ORDER BY ID ASC) AS PREV_VAL
, LEAD(VAL, 1, 0) OVER(ORDER BY ID ASC) AS NEXT_VAL 
, LAG(ID, 1, NULL) OVER(ORDER BY ID ASC) AS PREV_ID_VAL
, LEAD(ID, 1, NULL) OVER(ORDER BY ID ASC) AS NEXT_ID_VAL
FROM @T
), TEMP_CTE_TWO AS(
SELECT *, 1 AS ROW_LEVEL, ID AS START_ID, PREV_ID_VAL AS ORIG_PREV_ID FROM TEMP_CTE
WHERE VAL = 1 AND PREV_VAL = 0 
UNION ALL
SELECT TEMP_CTE.*, TEMP_CTE_TWO.ROW_LEVEL + 1 AS ROW_LEVEL, TEMP_CTE_TWO.START_ID, TEMP_CTE_TWO.ORIG_PREV_ID FROM TEMP_CTE_TWO
INNER JOIN TEMP_CTE ON TEMP_CTE.ID = TEMP_CTE_TWO.NEXT_ID_VAL AND TEMP_CTE.VAL = 1
), TEMP_CTE_THREE AS(
SELECT TEMP_CTE.ID, TEMP_CTE.VAL, 'Before' AS RUN_TYPE FROM TEMP_CTE INNER JOIN TEMP_CTE_TWO ON TEMP_CTE.ID = TEMP_CTE_TWO.ORIG_PREV_ID AND TEMP_CTE_TWO.NEXT_VAL = 0 AND TEMP_CTE_TWO.ROW_LEVEL >= @row_level
UNION ALL
SELECT TEMP_CTE.ID, TEMP_CTE.VAL, 'First' AS RUN_TYPE FROM TEMP_CTE INNER JOIN TEMP_CTE_TWO ON TEMP_CTE.ID = TEMP_CTE_TWO.START_ID AND TEMP_CTE_TWO.NEXT_VAL = 0 AND TEMP_CTE_TWO.ROW_LEVEL >= @row_level
UNION ALL
SELECT TEMP_CTE_TWO.ID, TEMP_CTE_TWO.VAL, 'Last' AS RUN_TYPE FROM TEMP_CTE_TWO WHERE TEMP_CTE_TWO.NEXT_VAL = 0 AND TEMP_CTE_TWO.ROW_LEVEL >= @row_level
UNION ALL
SELECT TEMP_CTE.ID, TEMP_CTE.VAL, 'After' AS RUN_TYPE FROM TEMP_CTE INNER JOIN TEMP_CTE_TWO ON TEMP_CTE.ID = TEMP_CTE_TWO.NEXT_ID_VAL AND TEMP_CTE_TWO.NEXT_VAL = 0 AND TEMP_CTE_TWO.ROW_LEVEL >= @row_level

)
SELECT * FROM TEMP_CTE_THREE 
ORDER BY ID

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 7

One way:

WITH 
numbered       AS (SELECT *, rn=ROW_NUMBER() OVER (ORDER BY ID ASC) FROM @T),
grouped        AS (SELECT *,first_zero=MAX(CASE WHEN Val<>1 THEN rn END) OVER (ORDER BY ID ASC ROWS UNBOUNDED PRECEDING) FROM numbered),
first_and_such AS (SELECT before_rn=MIN(CASE WHEN Val<>1 THEN rn END),first_rn=MIN(CASE WHEN Val=1 THEN rn END),last_rn=MAX(CASE WHEN Val=1 THEN rn END) FROM grouped GROUP BY first_zero),
split          AS (SELECT x.* FROM first_and_such CROSS APPLY (VALUES('Before',before_rn),('First',first_rn),('Last',last_rn),('After',last_rn+1))x(PosDesc,rn) WHERE last_rn-first_rn>1)

SELECT s.PosDesc,n.ID,n.Val
FROM   split s LEFT JOIN numbered n ON s.rn=n.rn
ORDER BY s.rn ASC;

Cheers!

Sergiy SSC Guru Points: 110206 More actions · Answer 8

This will give you ID's for records with zeros wrapping the set of sequential records with 1's in between:

SELECT Last0.ID Last0_ID, next0.ID Next0.ID
FROM @T Last0
OUTER APPLY (select top 1 Id from @T F0 WHERE F0.ID > Last0.ID and F0.Val = 0) next0
WHERE last0.Val = 0 
AND exists (select * from @T T1
where T1.ID > Last0.ID 
and (T1.ID < next0.ID or next0.ID is null)
group by Val
having COUNT(id) >=3
)

You should be able to get required outcome simply by using it as a CTE.

Just cater for NULL in next0.ID.

_____________
Code for TallyGenerator

ScottPletcher SSC Guru Points: 100948 More actions · Answer 9

Yet another version.

Non-serious note to Jeff Moden: See, I actually am willing to use an identity as the clustering key.

DECLARE @min_vals_in_sequence int
SET @min_vals_in_sequence = 3

DROP TABLE IF EXISTS #T_sequential
CREATE TABLE #T_sequential ( seq_id int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED, 
    ID int NULL, Val int NULL );

INSERT INTO #T_sequential ( ID, Val )
SELECT ID, Val
FROM @T
ORDER BY ID
INSERT INTO #T_sequential
VALUES(2000000000, 0)

SELECT     
    CASE WHEN T3.seq_id = T1.seq_id THEN 'Before'
         WHEN T3.seq_id = T1.seq_id + 1 THEN 'First'
         WHEN T3.seq_id = T2.seq_id - 1 THEN 'Last'
         WHEN T3.seq_id = T2.seq_id THEN 'After'
    END AS PosDesc,
    CASE WHEN T3.ID = 2000000000 THEN NULL ELSE T3.seq_ID END AS ID,
    CASE WHEN T3.ID = 2000000000 THEN NULL ELSE T3.Val END AS Val
FROM #T_sequential T1
CROSS APPLY (
    SELECT TOP (1) *
    FROM #T_sequential T2
    WHERE T2.ID > T1.ID AND T2.Val = 0
    ORDER BY T2.ID
) AS T2
INNER JOIN #T_sequential T3 ON T3.seq_id IN ( T1.seq_id, T1.seq_id + 1, T2.seq_id - 1, T2.seq_id )
WHERE T1.Val = 0  AND T2.seq_id - T1.seq_id - 1 >= @min_vals_in_sequence

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 10

ScottPletcher wrote:

Non-serious note to Jeff Moden: See, I actually am willing to use an identity as the clustering key.

😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 11

I played around with this for a while last night, trying to find the most concise complete solution (found one maybe 20% shorter than my previous solution, but its performance gets rather dreadful on larger datasets)

I also tried to improve on the performance of my previous solution, and here's another way.

It's not nearly as concise as my previous version, but performs better under all the circumstances I tested on my system (SQL Server 2019 on a high-ish end laptop), even if we help out my previous version by specifying a ROWS frame for the windowed aggregate (which allows it to make use of an in-memory worktable instead of the disk-based worktable used by the default RANGE).

WITH numbered     AS (SELECT *, rn  =ROW_NUMBER() OVER (ORDER BY ID ASC) FROM @T),
     lag_lead     AS (SELECT *, ID1 =LAG(ID,1)    OVER (ORDER BY ID ASC), ID2 =LEAD(ID,1)  OVER (ORDER BY ID ASC),
                                Val1=LAG(Val,1)   OVER (ORDER BY ID ASC), Val2=LEAD(Val,1) OVER (ORDER BY ID ASC),
                                rn1 =LAG(rn,1)    OVER (ORDER BY ID ASC), rn2 =LEAD(rn,1)  OVER (ORDER BY ID ASC)
                      FROM numbered),
     paired       AS (SELECT x.* FROM lag_lead l CROSS APPLY (VALUES(0, ID1, Val1, rn1, ID, Val, rn),
                                                                    (1, ID2, Val2, rn2, ID, Val,rn)
                                                              )x(pos, border_ID, border_Val, border_rn, ID, Val, rn)
                      WHERE l.Val=1 AND ISNULL(border_Val,0)<>1),

     combined AS     (SELECT Pos,before_ID=border_ID, before_Val=border_Val, first_ID=ID, first_Val=Val, 
                             last_ID =LEAD(ID,1)        OVER (ORDER BY ID ASC, pos ASC),        last_val =LEAD(VAL,1)        OVER (ORDER BY ID ASC, pos ASC),
                             after_ID=LEAD(border_ID,1) OVER (ORDER BY ID ASC, pos ASC),        after_Val=LEAD(border_Val,1) OVER (ORDER BY ID ASC, pos ASC), 
                             cnt     =LEAD(rn,1)        OVER (ORDER BY ID ASC, pos ASC)-rn      
                      FROM paired)

SELECT   x.*
FROM     combined c
         CROSS APPLY 
         (VALUES('Before',before_ID,before_Val),('First',first_ID,first_Val),('Last',last_ID,last_Val),('After',after_ID,after_Val))x(PosDesc,ID,Val)
WHERE    c.pos=0 AND c.cnt>1
ORDER BY first_ID ASC;

Cheers!

kuopaz Default port Points: 1449 More actions · Answer 12

Thanks to all for the solutions!

On a quick look, sparked enough ideas for me to try and (hopefully) write a solution - don't want to be beaten by it. I will then come back and compare to the solutions provided here.

jcelko212 32090 SSCrazy Eights Points: 9294 More actions · Answer 13

>> I want to find runs of at least three 1s, and return the result below. The 0s would be any test_values that are not 1. <<

This is been a classic freshman computer science course problem for decades. But it's usually done with a raise in a procedural language. The reasons done in a procedural language is that the concept of an ordering that you're trying to sneak in here doesn't apply in SQL or RDBMS.

CREATE TABLE Tests

(test_id INTEGER NOT NULL PRIMARY KEY,

test_val INTEGER NOT NULL

CHECK (test_val I (0,1));

INSERT INTO Tests

VALUES

(10, 1),

(17, 0),

(18, 0),

(27, 1), (34, 1), (42, 1),

(52, 0),

(62, 1),(63, 1),

(65, 0),

(73, 0),

(79, 1),(85, 1),(89, 1),(90, 1);

I'm curious as to why you used a 50-year-old Sybase syntax to do a table constructor. That "UNION ALL" simply guaranteed your code won't port. Why did you want to create a result Table and materialize it? Wouldn't a view or query result be better? Are you still thinking about filesystems and having to physically write results to magnetic tapes?

I also don't understand what your "pos_description" means. I would have thought that you would return an identifier of some sort for each run, and perhaps its length. Since SQL does not use ordering, words like before and after are frowned upon. Another thing we don't like in SQL is creating unnecessary NULLs. And yes, it does look like something to do with islands.

CREATE VIEW TestsRuns

AS

SELECT test_id, test_val,

(ROW_NUMBER() OVER (ORDER BY test_id)

- ROW_NUMBER() OVER (PARTITION BY test_val

ORDER BY test_id)) AS test_grp,

FROM Tests;

(10, 1, 1 - 1 = 0),

(17, 0, 2 - 1 = 1),

(18, 0, 3 - 2 = 1),

(27, 1, 4 - 1 = 3),

(34, 1, 5 - 2 = 3),

(42, 1, 6 - 3 = 3),

(52, 0, 7 - 1 = 6),

(62, 1, 8 - 1 = 7),

(63, 1, 9 - 2 = 7),

(65, 0, 10 - 1 = 9),

(73, 0, 11 - 2 = 9),

(79, 1, 12 - 1 =11),

(85,1, 13 - 2 = 11),

(89, 1, 14 - 3 = 11),

(90, 1 ,15 - 4 = 11);

Given this view, you can now inspect the runs with either windowed functions or just a simple group by.

Please post DDL and follow ANSI/ISO standards when asking for help.

ScottPletcher SSC Guru Points: 100948 More actions · Answer 14

jcelko212 32090 wrote:

>> I want to find runs of at least three 1s, and return the result below. The 0s would be any test_values that are not 1. <<
Since SQL does not use ordering, words like before and after are frowned upon. ...
Given this view, you can now inspect the runs with either windowed functions or just a simple group by.

That comment on ordering has to be unintended parody, since the code you posted uses two ORDER BYs to specify, well, the order of rows. And the real world orders (sequences) things all the time, including noting before and after. Such as, did you make a payment on/before it was due rather than after.

As to the final view, please show us how to get from that to the desired result. After already sorting twice and segmenting once the entire input data, you still haven't determined the specific rows that need included in the output.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.