Remove successive rows with same column value

  • Hello everyone, I'm a beginner in SQL and i have difficulty to find solution for the folowing problem:

    I'd like to do a SELECT which eliminate successive rows having the same "stat" and keep only the last occurrence.

    To make myself better understood, I give you this example:

    My table Results

    idLigne | id | statut idLigne | id | statut

    L1 1 A L2 1 A

    L2 1 A L6 1 B

    L3 1 B L7 1 A

    L4 1 B ====> L8 1 B

    L5 1 B

    L6 1 B

    L7 1 A

    L8 1 B

    Thank you

  • Hi,

    Try:

    with CTE as

    (

    select

    idLigne,

    id,

    statut,

    ROW_NUMBER() OVER(PARTITION BY id, statut ORDER BY idLigne) AS 'RowNo'

    from MyTable

    )

    select * from CTE where RowNo = 1

    Hope this helps.

  • I'm guessing something like this:

    /*

    My table Results

    idLigne | id | statut idLigne | id | statut

    L1 1 A L2 1 A

    L2 1 A L6 1 B

    L3 1 B L7 1 A

    L4 1 B ====> L8 1 B

    L5 1 B

    L6 1 B

    L7 1 A

    L8 1 B

    */

    create table dbo.MyTest (

    idLigne char(2),

    id int,

    statut char(1)

    );

    go

    insert into dbo.MyTest

    values ('L1',1,'A'),('L2',1,'A'),('L3',1,'B'),('L4',1,'B'),('L5',1,'B'),('L6',1,'B'),('L7',1,'A'),('L8',1,'B');

    go

    select * from dbo.MyTest;

    go

    -- Assumption based on data from OP: ifLigne is an ever increasing value for ordering the data

    with BaseData as (

    select

    ROW_NUMBER() over (order by idLigne) rownum,

    idLigne,

    id,

    statut

    from

    dbo.MyTest

    )

    select

    bd1.idLigne,

    bd1.id,

    bd1.statut

    from

    BaseData bd1

    left outer join BaseData bd2

    on (bd1.rownum = bd2.rownum - 1

    and bd1.id = bd2.id

    and bd1.statut = bd2.statut)

    where

    bd2.rownum is null

    ;

    go

    drop table dbo.MyTest;

    go

  • Sorry, my suggestion is completely wrong.

  • i was attacking it the same way lynn was and my code is very similar. i may suggest making idLigne an INT instead of a VARCHAR as that would eliminate the need for the CTE to row_number() to get nice INT's for the join math.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • Ummmm... I've not run any of the code but, just by looking, shouldn't any ordering be done in descending order so that you get the "latest" row like what the OP posted?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • capn.hector (5/4/2012)


    i was attacking it the same way lynn was and my code is very similar. i may suggest making idLigne an INT instead of a VARCHAR as that would eliminate the need for the CTE to row_number() to get nice INT's for the join math.

    My guess would be keep the cte to sequentially number the rows in case idLigne values have gaps due to deletions or such. Sort of like identity values that could get gaps.

  • Jeff Moden (5/4/2012)


    Ummmm... I've not run any of the code but, just by looking, shouldn't any ordering be done in descending order so that you get the "latest" row like what the OP posted?

    Actually Jeff, mine does return the latest values. Probably could number the rows in descending order and with a slight change in logic get the same results.

  • This is one type of problem where it is perhaps easier to see the logic using SQL Server 2012 syntax:

    DECLARE @MyTable AS TABLE

    (

    idLigne character(2) PRIMARY KEY,

    id integer NOT NULL,

    statut character(1) NOT NULL

    );

    INSERT @MyTable

    (idLigne, id, statut)

    VALUES

    ('L1',1,'A'),

    ('L2',1,'A'),

    ('L3',1,'B'),

    ('L4',1,'B'),

    ('L5',1,'B'),

    ('L6',1,'B'),

    ('L7',1,'A'),

    ('L8',1,'B');

    SELECT

    WithNextRow.idLigne,

    WithNextRow.id,

    WithNextRow.statut

    FROM

    (

    -- Add the next statut in sequence to the set

    SELECT

    mt.idLigne,

    mt.id,

    mt.statut,

    next_statut = LEAD(mt.statut, 1, CHAR(0)) OVER (ORDER BY mt.idLigne)

    FROM @MyTable AS mt

    ) AS WithNextRow

    WHERE

    -- Show only rows where the next statut is different

    -- from the current one

    WithNextRow.statut <> WithNextRow.next_statut;

    Without LEAD (which is new for 2012), we have to number the rows in the desired sequence, and use that to perform a self-join (linking the 'current' row to the next or previous one) as shown in Lynn's example. The new feature may be more or less efficient than the self-join in different circumstances.

  • Lynn Pettis (5/4/2012)


    Jeff Moden (5/4/2012)


    Ummmm... I've not run any of the code but, just by looking, shouldn't any ordering be done in descending order so that you get the "latest" row like what the OP posted?

    Actually Jeff, mine does return the latest values. Probably could number the rows in descending order and with a slight change in logic get the same results.

    I stand corrected. You might want to take a look at the actual execution plan, though. You have an accidental cross-join (returns 8*8 or 64 internal rows). I don't know if the optimizer will chose to make that go away when a larger number of rows is present.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • anwar.melh (5/4/2012)


    Hello everyone, I'm a beginner in SQL and i have difficulty to find solution for the folowing problem:

    I'd like to do a SELECT which eliminate successive rows having the same "stat" and keep only the last occurrence.

    To make myself better understood, I give you this example:

    My table Results

    idLigne | id | statut idLigne | id | statut

    L1 1 A L2 1 A

    L2 1 A L6 1 B

    L3 1 B L7 1 A

    L4 1 B ====> L8 1 B

    L5 1 B

    L6 1 B

    L7 1 A

    L8 1 B

    Thank you

    There is nothing in the data you've shown to preserve the order of the rows. idLigne will make a mess of things just as soon as it gets to "L10" for sorting purposes.

    Do you have a different column that will produce the correct sort order?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/4/2012)


    Lynn Pettis (5/4/2012)


    Jeff Moden (5/4/2012)


    Ummmm... I've not run any of the code but, just by looking, shouldn't any ordering be done in descending order so that you get the "latest" row like what the OP posted?

    Actually Jeff, mine does return the latest values. Probably could number the rows in descending order and with a slight change in logic get the same results.

    I stand corrected. You might want to take a look at the actual execution plan, though. You have an accidental cross-join (returns 8*8 or 64 internal rows). I don't know if the optimizer will chose to make that go away when a larger number of rows is present.

    Ah... got it. If you change your LEFT OUTER JOIN to a LEFT MERGE JOIN, the accidental cross join goes away, reads drop from 40 to 6, and performance becomes comparatively lightning quick. In this case, I think it's ok to use the join hint because the ROW_NUMBER() function will produce the sorted output that a MERGE JOIN actually needs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/4/2012)


    Ah... got it. If you change your LEFT OUTER JOIN to a LEFT MERGE JOIN, the accidental cross join goes away, reads drop from 40 to 6, and performance becomes comparatively lightning quick. In this case, I think it's ok to use the join hint because the ROW_NUMBER() function will produce the sorted output that a MERGE JOIN actually needs.

    I prefer OPTION (MERGE JOIN) over a join hint. Using a join hint adds an implied FORCE ORDER too. That may not matter here, but it will in other queries. That said, using MERGE here does add extra sorts to the plan, which may not be desirable. The optimizer fails to recognize that ROW_NUMBER produces a unique, ordered stream, so sorts are introduced and the merge join is implemented as many-to-many, which requires a work table. There is an open Connect item for this: http://connect.microsoft.com/SQLServer/feedback/details/347442/optimizer-does-not-remove-unnessecary-distinct-in-counjunction-with-row-count

  • SQL Kiwi (5/4/2012)


    Jeff Moden (5/4/2012)


    Ah... got it. If you change your LEFT OUTER JOIN to a LEFT MERGE JOIN, the accidental cross join goes away, reads drop from 40 to 6, and performance becomes comparatively lightning quick. In this case, I think it's ok to use the join hint because the ROW_NUMBER() function will produce the sorted output that a MERGE JOIN actually needs.

    I prefer OPTION (MERGE JOIN) over a join hint. Using a join hint adds an implied FORCE ORDER too. That may not matter here, but it will in other queries. That said, using MERGE here does add extra sorts to the plan, which may not be desirable. The optimizer fails to recognize that ROW_NUMBER produces a unique, ordered stream, so sorts are introduced and the merge join is implemented as many-to-many, which requires a work table. There is an open Connect item for this: http://connect.microsoft.com/SQLServer/feedback/details/347442/optimizer-does-not-remove-unnessecary-distinct-in-counjunction-with-row-count

    Damn it. I was in a hurry and didn't scroll far enough to the right on the execution plan. :blush: You're absolutely correct about the extra sorts.

    I think the extra sorts are absolutely worth it, though. They get rid of the accidental cross join, nearly an order of magnitude of reads, and runs several orders of magnitude faster according to priofiler.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • To continue, both queries use a "Work Table". The one with the MERGE hint uses it a whole lot less, though.

    [font="Courier New"]Warning: The join order has been enforced because a local join hint is used.

    (4 row(s) affected)

    Table 'Worktable'. Scan count 1, logical reads 31, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'MyTest'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (4 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'MyTest'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.[/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply