Remove successive rows with same column value

Question

Remove successive rows with same column value

Viewing 8 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150468 More actions · Answer 1

Jeff Moden (5/4/2012)
I think the extra sorts are absolutely worth it, though. They get rid of the accidental cross join, nearly an order of magnitude of reads, and runs several orders of magnitude faster according to priofiler.

It depends, as always (e.g. the sorts might spill to disk). Anyway, the point I'm trying to make here is that the optimizer has limitations which mean it does not choose as good an execution plan as it might here. Aside from anything else, things like ROW_NUMBER are not relational, and the optimizer reasons about things primarily in relational terms. ROW_NUMBER does not produce a set (strictly, a multi-set) it produces a sequence. Adding the row number column (projecting a new column in the jargon) means projecting a sequence; hence the Sequence Project operator in the plan. More limitations mean the common table expression is evaluated, and rows numbered, twice. Yet more mean the merge join has to account for the possibility of duplicates, and it runs in many-to-many mode with a needless work table.

Often, it is possible to give the optimizer better information, so it will produce good plans now and in the future when the number of rows and distribution changes. One way to do this is to materialize the CTE:

CREATE TABLE #Materialized

(

sortOrder bigint PRIMARY KEY CLUSTERED,

idLigne character (2) NOT NULL,

id integer NOT NULL,

statut character(1) NOT NULL,

);

INSERT #Materialized

(

sortOrder,

idLigne,

id,

statut

)

SELECT

sortOrder = ROW_NUMBER() OVER (ORDER BY mt.idLigne),

mt.idLigne,

mt.id,

mt.statut

FROM

dbo.MyTest AS mt;

-- Pretend there are a few more rows

UPDATE STATISTICS #Materialized WITH ROWCOUNT = 50, PAGECOUNT = 1;

-- One to many merge

SELECT

this.idLigne,

this.id,

this.statut

FROM #Materialized AS this

WHERE

NOT EXISTS

(

SELECT 1

FROM #Materialized AS nxt

WHERE

nxt.sortOrder = this.sortOrder + 1

AND nxt.statut = this.statut

);

DROP TABLE #Materialized;

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 2

Jeff Moden (5/4/2012)
To continue, both queries use a "Work Table". The one with the MERGE hint uses it a whole lot less, though.

Sure. The work table in the loops join case is for the spool, the one in the merge join case is for the logical possibility of duplicates (the many-to-many part) but is obviously never actually required (though it is still created).

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 3

SQL Kiwi (5/4/2012)
One way to do this is to materialize the CTE

I'm definitely a fan of that especially when a CTE is referenced more than once.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 4

Jeff Moden (5/5/2012)
SQL Kiwi (5/4/2012)
One way to do this is to materialize the CTE
I'm definitely a fan of that especially when a CTE is referenced more than once.

I think that is one thing Oracle does right with their subquery refactoring clause (aka, SQL Servers cte). It can either execute it inline or use it as a temporary table.

Paul White SSC Guru Points: 150468 More actions · Answer 5

Lynn Pettis (5/5/2012)
Jeff Moden (5/5/2012)
SQL Kiwi (5/4/2012)
One way to do this is to materialize the CTE
I'm definitely a fan of that especially when a CTE is referenced more than once.
I think that is one thing Oracle does right with their subquery refactoring clause (aka, SQL Servers cte). It can either execute it inline or use it as a temporary table.

Have you voted for the Connect item?

https://connect.microsoft.com/SQLServer/feedback/details/218968/provide-a-hint-to-force-intermediate-materialization-of-ctes-or-derived-tables

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 6

I have, indeed.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 7

Lynn Pettis

SSC Guru

Points: 442470

May 6, 2012 at 10:42 am

#1483904

So have I.

Med Anwar SSC Enthusiast Points: 172 More actions · Answer 8

Hello everyone. I'm sorry if I took time to answer you I was not home this weekend.

Thank you for your response ! your proposals are very interesting especially the solution without creating and dropping a table.

I gave you an example too simple to understand the real problem seeing this example:

I have the table "TableSorted"

idLigne | id | stat | datetime

1 1 A 1/01/2011 14:00:00

2 1 A 1/01/2011 14:00:05

3 1 A 1/01/2011 15:00:00

4 1 B 1/01/2011 16:00:00

5 1 A 1/01/2011 17:00:00

6 1 B 1/01/2011 18:00:00

7 1 B 1/01/2011 19:00:00

8 2 A 1/01/2011 12:00:00

9 2 A 1/01/2011 12:30:00

10 2 A 2/01/2011 11:00:00

11 2 A 2/01/2011 11:10:00

12 2 B 2/01/2011 15:00:00

and here is the desired result:

idLigne | id | stat | datetime

3 1 A 1/01/2011 15:00:00

4 1 B 1/01/2011 16:00:00

5 1 A 1/01/2011 17:00:00

7 1 B 1/01/2011 19:00:00

9 2 A 1/01/2011 12:30:00

11 2 A 2/01/2011 11:10:00

12 2 B 2/01/2011 15:00:00

I opt finally to use of the following query that i found simple :

with transformed as (

select idLigne, id,

stat,datetime,

row_number () over (order by idLigne) rn

from TableSorted)

select TableSorted.*

from TableSorted

inner join transformed t1

on TableSorted.idLigne = t1.idLigne

left join transformed t2

on t1.rn = t2.rn - 1

and t1.stat = t2.stat and t1.id = t2.id and year(t1.datetime) = year(t2.datetime) and month(t1.datetime) = month(t2.datetime) and day(t1.datetime) = day(t2.datetime)

where t2.rn is null