# Remove successive rows with same column value

• Med Anwar

SSC Enthusiast

Points: 142

Hello everyone, I'm a beginner in SQL and i have difficulty to find solution for the folowing problem:

I'd like to do a SELECT which eliminate successive rows having the same "stat" and keep only the last occurrence.

To make myself better understood, I give you this example:

`My table Results`

``` idLigne | id | statut idLigne | id | statut L1 1 A L2 1 A L2 1 A L6 1 B L3 1 B L7 1 A L4 1 B ====> L8 1 B L5 1 B L6 1 B L7 1 A ```

` L8 1 B`

Thank you

• imex

Ten Centuries

Points: 1173

Hi,

Try:

`with CTE as`

``` ( select idLigne, id, statut, ROW_NUMBER() OVER(PARTITION BY id, statut ORDER BY idLigne) AS 'RowNo' from MyTable ) ```

`select * from CTE where RowNo = 1`

Hope this helps.

• Lynn Pettis

SSC Guru

Points: 442359

I'm guessing something like this:

``` /* My table Results idLigne | id | statut idLigne | id | statut L1 1 A L2 1 A L2 1 A L6 1 B L3 1 B L7 1 A L4 1 B ====> L8 1 B L5 1 B L6 1 B L7 1 A L8 1 B */ create table dbo.MyTest ( idLigne char(2), id int, statut char(1) ); go insert into dbo.MyTest values ('L1',1,'A'),('L2',1,'A'),('L3',1,'B'),('L4',1,'B'),('L5',1,'B'),('L6',1,'B'),('L7',1,'A'),('L8',1,'B'); go select * from dbo.MyTest; go -- Assumption based on data from OP: ifLigne is an ever increasing value for ordering the data with BaseData as ( select ROW_NUMBER() over (order by idLigne) rownum, idLigne, id, statut from dbo.MyTest ) select bd1.idLigne, bd1.id, bd1.statut from BaseData bd1 left outer join BaseData bd2 on (bd1.rownum = bd2.rownum - 1 and bd1.id = bd2.id and bd1.statut = bd2.statut) where bd2.rownum is null ; go drop table dbo.MyTest; go ```

• imex

Ten Centuries

Points: 1173

Sorry, my suggestion is completely wrong.

• CapnHector

SSCoach

Points: 16143

i was attacking it the same way lynn was and my code is very similar. i may suggest making idLigne an INT instead of a VARCHAR as that would eliminate the need for the CTE to row_number() to get nice INT's for the join math.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]

• Jeff Moden

SSC Guru

Points: 997120

Ummmm... I've not run any of the code but, just by looking, shouldn't any ordering be done in descending order so that you get the "latest" row like what the OP posted?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".
"Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

• Lynn Pettis

SSC Guru

Points: 442359

capn.hector (5/4/2012)

i was attacking it the same way lynn was and my code is very similar. i may suggest making idLigne an INT instead of a VARCHAR as that would eliminate the need for the CTE to row_number() to get nice INT's for the join math.

My guess would be keep the cte to sequentially number the rows in case idLigne values have gaps due to deletions or such. Sort of like identity values that could get gaps.

• Lynn Pettis

SSC Guru

Points: 442359

Jeff Moden (5/4/2012)

Ummmm... I've not run any of the code but, just by looking, shouldn't any ordering be done in descending order so that you get the "latest" row like what the OP posted?

Actually Jeff, mine does return the latest values. Probably could number the rows in descending order and with a slight change in logic get the same results.

• Paul White

SSC Guru

Points: 150442

This is one type of problem where it is perhaps easier to see the logic using SQL Server 2012 syntax:

``` DECLARE @MyTable AS TABLE ( idLigne character(2) PRIMARY KEY, id integer NOT NULL, statut character(1) NOT NULL ); INSERT @MyTable (idLigne, id, statut) VALUES ('L1',1,'A'), ('L2',1,'A'), ('L3',1,'B'), ('L4',1,'B'), ('L5',1,'B'), ('L6',1,'B'), ('L7',1,'A'), ('L8',1,'B'); ```

``` SELECT WithNextRow.idLigne, WithNextRow.id, WithNextRow.statut FROM ( -- Add the next statut in sequence to the set SELECT mt.idLigne, mt.id, mt.statut, next_statut = LEAD(mt.statut, 1, CHAR(0)) OVER (ORDER BY mt.idLigne) FROM @MyTable AS mt ) AS WithNextRow WHERE -- Show only rows where the next statut is different -- from the current one ```

` WithNextRow.statut <> WithNextRow.next_statut;`

Without LEAD (which is new for 2012), we have to number the rows in the desired sequence, and use that to perform a self-join (linking the 'current' row to the next or previous one) as shown in Lynn's example. The new feature may be more or less efficient than the self-join in different circumstances.

• Jeff Moden

SSC Guru

Points: 997120

Lynn Pettis (5/4/2012)

Jeff Moden (5/4/2012)

Ummmm... I've not run any of the code but, just by looking, shouldn't any ordering be done in descending order so that you get the "latest" row like what the OP posted?

Actually Jeff, mine does return the latest values. Probably could number the rows in descending order and with a slight change in logic get the same results.

I stand corrected. You might want to take a look at the actual execution plan, though. You have an accidental cross-join (returns 8*8 or 64 internal rows). I don't know if the optimizer will chose to make that go away when a larger number of rows is present.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".
"Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

• Jeff Moden

SSC Guru

Points: 997120

anwar.melh (5/4/2012)

Hello everyone, I'm a beginner in SQL and i have difficulty to find solution for the folowing problem:

I'd like to do a SELECT which eliminate successive rows having the same "stat" and keep only the last occurrence.

To make myself better understood, I give you this example:

`My table Results`

``` idLigne | id | statut idLigne | id | statut L1 1 A L2 1 A L2 1 A L6 1 B L3 1 B L7 1 A L4 1 B ====> L8 1 B L5 1 B L6 1 B L7 1 A ```

` L8 1 B`

Thank you

There is nothing in the data you've shown to preserve the order of the rows. idLigne will make a mess of things just as soon as it gets to "L10" for sorting purposes.

Do you have a different column that will produce the correct sort order?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".
"Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

• Jeff Moden

SSC Guru

Points: 997120

Jeff Moden (5/4/2012)

Lynn Pettis (5/4/2012)

Jeff Moden (5/4/2012)

Ummmm... I've not run any of the code but, just by looking, shouldn't any ordering be done in descending order so that you get the "latest" row like what the OP posted?

Actually Jeff, mine does return the latest values. Probably could number the rows in descending order and with a slight change in logic get the same results.

I stand corrected. You might want to take a look at the actual execution plan, though. You have an accidental cross-join (returns 8*8 or 64 internal rows). I don't know if the optimizer will chose to make that go away when a larger number of rows is present.

Ah... got it. If you change your LEFT OUTER JOIN to a LEFT MERGE JOIN, the accidental cross join goes away, reads drop from 40 to 6, and performance becomes comparatively lightning quick. In this case, I think it's ok to use the join hint because the ROW_NUMBER() function will produce the sorted output that a MERGE JOIN actually needs.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".
"Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

• Paul White

SSC Guru

Points: 150442

Jeff Moden (5/4/2012)

Ah... got it. If you change your LEFT OUTER JOIN to a LEFT MERGE JOIN, the accidental cross join goes away, reads drop from 40 to 6, and performance becomes comparatively lightning quick. In this case, I think it's ok to use the join hint because the ROW_NUMBER() function will produce the sorted output that a MERGE JOIN actually needs.

I prefer OPTION (MERGE JOIN) over a join hint. Using a join hint adds an implied FORCE ORDER too. That may not matter here, but it will in other queries. That said, using MERGE here does add extra sorts to the plan, which may not be desirable. The optimizer fails to recognize that ROW_NUMBER produces a unique, ordered stream, so sorts are introduced and the merge join is implemented as many-to-many, which requires a work table. There is an open Connect item for this: http://connect.microsoft.com/SQLServer/feedback/details/347442/optimizer-does-not-remove-unnessecary-distinct-in-counjunction-with-row-count

• Jeff Moden

SSC Guru

Points: 997120

SQL Kiwi (5/4/2012)

Jeff Moden (5/4/2012)

Ah... got it. If you change your LEFT OUTER JOIN to a LEFT MERGE JOIN, the accidental cross join goes away, reads drop from 40 to 6, and performance becomes comparatively lightning quick. In this case, I think it's ok to use the join hint because the ROW_NUMBER() function will produce the sorted output that a MERGE JOIN actually needs.

I prefer OPTION (MERGE JOIN) over a join hint. Using a join hint adds an implied FORCE ORDER too. That may not matter here, but it will in other queries. That said, using MERGE here does add extra sorts to the plan, which may not be desirable. The optimizer fails to recognize that ROW_NUMBER produces a unique, ordered stream, so sorts are introduced and the merge join is implemented as many-to-many, which requires a work table. There is an open Connect item for this: http://connect.microsoft.com/SQLServer/feedback/details/347442/optimizer-does-not-remove-unnessecary-distinct-in-counjunction-with-row-count

Damn it. I was in a hurry and didn't scroll far enough to the right on the execution plan. :blush: You're absolutely correct about the extra sorts.

I think the extra sorts are absolutely worth it, though. They get rid of the accidental cross join, nearly an order of magnitude of reads, and runs several orders of magnitude faster according to priofiler.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".
"Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

• Jeff Moden

SSC Guru

Points: 997120

To continue, both queries use a "Work Table". The one with the MERGE hint uses it a whole lot less, though.

[font="Courier New"]Warning: The join order has been enforced because a local join hint is used.

(4 row(s) affected)

(4 row(s) affected)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".
"Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"