Remove successive rows with same column value

Question

Post reply

Remove successive rows with same column value

Med Anwar

SSC Enthusiast

Points: 172
More actions
May 4, 2012 at 3:41 am

#257512

Hello everyone, I'm a beginner in SQL and i have difficulty to find solution for the folowing problem:
I'd like to do a SELECT which eliminate successive rows having the same "stat" and keep only the last occurrence.
To make myself better understood, I give you this example:
My table Results
idLigne | id | statut idLigne | id | statut
L1 1 A L2 1 A
L2 1 A L6 1 B
L3 1 B L7 1 A
L4 1 B ====> L8 1 B
L5 1 B
L6 1 B
L7 1 A
L8 1 B
Thank you

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply

imex Ten Centuries Points: 1234 More actions · Answer 1

Hi,

Try:

with CTE as

(

select

idLigne,

id,

statut,

ROW_NUMBER() OVER(PARTITION BY id, statut ORDER BY idLigne) AS 'RowNo'

from MyTable

)

select * from CTE where RowNo = 1

Hope this helps.

https://www.imoveisemexposicao.com.br/imobiliarias-em-guarulhos

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

I'm guessing something like this:

/*

My table Results

idLigne | id | statut idLigne | id | statut

L1 1 A L2 1 A

L2 1 A L6 1 B

L3 1 B L7 1 A

L4 1 B ====> L8 1 B

L5 1 B

L6 1 B

L7 1 A

L8 1 B

*/

create table dbo.MyTest (

idLigne char(2),

id int,

statut char(1)

);

go

insert into dbo.MyTest

values ('L1',1,'A'),('L2',1,'A'),('L3',1,'B'),('L4',1,'B'),('L5',1,'B'),('L6',1,'B'),('L7',1,'A'),('L8',1,'B');

go

select * from dbo.MyTest;

go

-- Assumption based on data from OP: ifLigne is an ever increasing value for ordering the data

with BaseData as (

select

ROW_NUMBER() over (order by idLigne) rownum,

idLigne,

id,

statut

from

dbo.MyTest

)

select

bd1.idLigne,

bd1.id,

bd1.statut

from

BaseData bd1

left outer join BaseData bd2

on (bd1.rownum = bd2.rownum - 1

and bd1.id = bd2.id

and bd1.statut = bd2.statut)

where

bd2.rownum is null

;

go

drop table dbo.MyTest;

go

imex Ten Centuries Points: 1234 More actions · Answer 3

Sorry, my suggestion is completely wrong.

https://www.imoveisemexposicao.com.br/imobiliarias-em-guarulhos

CapnHector SSCoach Points: 16143 More actions · Answer 4

i was attacking it the same way lynn was and my code is very similar. i may suggest making idLigne an INT instead of a VARCHAR as that would eliminate the need for the CTE to row_number() to get nice INT's for the join math.

For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 5

Ummmm... I've not run any of the code but, just by looking, shouldn't any ordering be done in descending order so that you get the "latest" row like what the OP posted?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

capn.hector (5/4/2012)
i was attacking it the same way lynn was and my code is very similar. i may suggest making idLigne an INT instead of a VARCHAR as that would eliminate the need for the CTE to row_number() to get nice INT's for the join math.

My guess would be keep the cte to sequentially number the rows in case idLigne values have gaps due to deletions or such. Sort of like identity values that could get gaps.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

Jeff Moden (5/4/2012)
Ummmm... I've not run any of the code but, just by looking, shouldn't any ordering be done in descending order so that you get the "latest" row like what the OP posted?

Actually Jeff, mine does return the latest values. Probably could number the rows in descending order and with a slight change in logic get the same results.

Paul White SSC Guru Points: 150468 More actions · Answer 8

This is one type of problem where it is perhaps easier to see the logic using SQL Server 2012 syntax:

DECLARE @MyTable AS TABLE

(

idLigne character(2) PRIMARY KEY,

id integer NOT NULL,

statut character(1) NOT NULL

);

INSERT @MyTable

(idLigne, id, statut)

VALUES

('L1',1,'A'),

('L2',1,'A'),

('L3',1,'B'),

('L4',1,'B'),

('L5',1,'B'),

('L6',1,'B'),

('L7',1,'A'),

('L8',1,'B');

SELECT

WithNextRow.idLigne,

WithNextRow.id,

WithNextRow.statut

FROM

(

-- Add the next statut in sequence to the set

SELECT

mt.idLigne,

mt.id,

mt.statut,

next_statut = LEAD(mt.statut, 1, CHAR(0)) OVER (ORDER BY mt.idLigne)

FROM @MyTable AS mt

) AS WithNextRow

WHERE

-- Show only rows where the next statut is different

-- from the current one

WithNextRow.statut <> WithNextRow.next_statut;

Without LEAD (which is new for 2012), we have to number the rows in the desired sequence, and use that to perform a self-join (linking the 'current' row to the next or previous one) as shown in Lynn's example. The new feature may be more or less efficient than the self-join in different circumstances.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 9

Lynn Pettis (5/4/2012)
Jeff Moden (5/4/2012)
Ummmm... I've not run any of the code but, just by looking, shouldn't any ordering be done in descending order so that you get the "latest" row like what the OP posted?
Actually Jeff, mine does return the latest values. Probably could number the rows in descending order and with a slight change in logic get the same results.

I stand corrected. You might want to take a look at the actual execution plan, though. You have an accidental cross-join (returns 8*8 or 64 internal rows). I don't know if the optimizer will chose to make that go away when a larger number of rows is present.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 10

anwar.melh (5/4/2012)
Hello everyone, I'm a beginner in SQL and i have difficulty to find solution for the folowing problem:
I'd like to do a SELECT which eliminate successive rows having the same "stat" and keep only the last occurrence.
To make myself better understood, I give you this example:
My table Results
idLigne | id | statut idLigne | id | statut
L1 1 A L2 1 A
L2 1 A L6 1 B
L3 1 B L7 1 A
L4 1 B ====> L8 1 B
L5 1 B
L6 1 B
L7 1 A
L8 1 B
Thank you

There is nothing in the data you've shown to preserve the order of the rows. idLigne will make a mess of things just as soon as it gets to "L10" for sorting purposes.

Do you have a different column that will produce the correct sort order?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 11

Jeff Moden (5/4/2012)
Lynn Pettis (5/4/2012)
Jeff Moden (5/4/2012)
Ummmm... I've not run any of the code but, just by looking, shouldn't any ordering be done in descending order so that you get the "latest" row like what the OP posted?
Actually Jeff, mine does return the latest values. Probably could number the rows in descending order and with a slight change in logic get the same results.
I stand corrected. You might want to take a look at the actual execution plan, though. You have an accidental cross-join (returns 8*8 or 64 internal rows). I don't know if the optimizer will chose to make that go away when a larger number of rows is present.

Ah... got it. If you change your LEFT OUTER JOIN to a LEFT MERGE JOIN, the accidental cross join goes away, reads drop from 40 to 6, and performance becomes comparatively lightning quick. In this case, I think it's ok to use the join hint because the ROW_NUMBER() function will produce the sorted output that a MERGE JOIN actually needs.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 12

Jeff Moden (5/4/2012)
Ah... got it. If you change your LEFT OUTER JOIN to a LEFT MERGE JOIN, the accidental cross join goes away, reads drop from 40 to 6, and performance becomes comparatively lightning quick. In this case, I think it's ok to use the join hint because the ROW_NUMBER() function will produce the sorted output that a MERGE JOIN actually needs.

I prefer OPTION (MERGE JOIN) over a join hint. Using a join hint adds an implied FORCE ORDER too. That may not matter here, but it will in other queries. That said, using MERGE here does add extra sorts to the plan, which may not be desirable. The optimizer fails to recognize that ROW_NUMBER produces a unique, ordered stream, so sorts are introduced and the merge join is implemented as many-to-many, which requires a work table. There is an open Connect item for this: http://connect.microsoft.com/SQLServer/feedback/details/347442/optimizer-does-not-remove-unnessecary-distinct-in-counjunction-with-row-count

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 13

SQL Kiwi (5/4/2012)
Jeff Moden (5/4/2012)
Ah... got it. If you change your LEFT OUTER JOIN to a LEFT MERGE JOIN, the accidental cross join goes away, reads drop from 40 to 6, and performance becomes comparatively lightning quick. In this case, I think it's ok to use the join hint because the ROW_NUMBER() function will produce the sorted output that a MERGE JOIN actually needs.
I prefer OPTION (MERGE JOIN) over a join hint. Using a join hint adds an implied FORCE ORDER too. That may not matter here, but it will in other queries. That said, using MERGE here does add extra sorts to the plan, which may not be desirable. The optimizer fails to recognize that ROW_NUMBER produces a unique, ordered stream, so sorts are introduced and the merge join is implemented as many-to-many, which requires a work table. There is an open Connect item for this: http://connect.microsoft.com/SQLServer/feedback/details/347442/optimizer-does-not-remove-unnessecary-distinct-in-counjunction-with-row-count

Damn it. I was in a hurry and didn't scroll far enough to the right on the execution plan. :blush: You're absolutely correct about the extra sorts.

I think the extra sorts are absolutely worth it, though. They get rid of the accidental cross join, nearly an order of magnitude of reads, and runs several orders of magnitude faster according to priofiler.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 14

To continue, both queries use a "Work Table". The one with the MERGE hint uses it a whole lot less, though.

[font="Courier New"]Warning: The join order has been enforced because a local join hint is used.

(4 row(s) affected)

Table 'Worktable'. Scan count 1, logical reads 31, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'MyTest'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(4 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'MyTest'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.[/font]

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)