RE: Removing only contiguous/adjacent duplicate records from a rowset

SSC-Forever

Points: 41690

October 24, 2012 at 6:54 am

EamonSQL (10/24/2012)
select from the table using the ROW_NUMBER() function ordering by id and partitioning by value.
Wrap in brackets giving you a derived table.
Run a delete command against the table and joining to the derived table on the id and where rownum > 1
That should sort you out.
Eamon

The OP said: -

praveen_vejandla (10/24/2012)
id value name
-- ------ -------
1 10 test
2 5 prod
3 10 test
4 4 test
5 4 test
6 10 test
Only records with id 4,5 should be deleted.

Here's some sample data to play with: -

DECLARE @testtable TABLE (id INT, value INT, NAME CHAR(4));

INSERT INTO @testtable

VALUES (1, 10, 'test'), (2, 5, 'prod'), (3, 10, 'test'), (4, 4, 'test'), (5, 4, 'test'), (6, 10, 'test');

So the expected result after you have deleted the bad rows is: -

id value NAME

----------- ----------- ----

1 10 test

2 5 prod

3 10 test

6 10 test

So, let's code up what you just described: -

DELETE a

FROM @testtable a

INNER JOIN (SELECT id, value, NAME,

ROW_NUMBER() OVER(PARTITION BY value ORDER BY id) AS rownum

FROM @testtable) b ON a.id = b.id

WHERE b.rownum > 1;

OK, now we'll run it against the sample data and the result is: -

id value NAME

----------- ----------- ----

1 10 test

2 5 prod

4 4 test

Ah, it seems you've deleted a few extra rows and kept a row that we wanted to delete.

OK, let's take a look at ThomasRushton's answer: -

DELETE

FROM @testtable

WHERE id IN (SELECT t2.id

FROM @testtable t1

INNER JOIN @testtable t2 ON t1.id = t2.id - 1

INNER JOIN @testtable t3 ON t2.id = t3.id - 1

WHERE t1.NAME = t2.NAME AND t2.NAME = t3.NAME

);

For me, there are two issues here. The first is that we're touching the table 4 times, which is unnecessary. The second is that we're only looking at "name" to see if a record is a duplicate. (If the OP agrees with this as the requirement, then we can scratch my second issue).

What about the result? Does it give the correct answer from the sample data?

id value NAME

----------- ----------- ----

1 10 test

2 5 prod

3 10 test

6 10 test

Yes is does! Excellent.

Now, let's look at my answer.

DELETE a

FROM @testtable a

WHERE EXISTS (SELECT 1

FROM @testtable b

WHERE (a.id = b.id+1 OR a.id = b.id-1)

AND a.NAME = b.NAME AND a.value = b.value);

This time we're only touching the table twice and we're looking at both name and value to determine a duplicate. But does it produce the correct result based on the sample data?

id value NAME

----------- ----------- ----

1 10 test

2 5 prod

3 10 test

6 10 test

Again, yes it does.

A question for the OP though - after the query has been run you'll notice that the result-set from the has a new "duplicate" (3 and 6). If you were to run either of the working solutions again, they wouldn't detect this as a duplicate. Would you expect them to? Or is this desired behaviour?

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/