Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Removing only contiguous/adjacent duplicate records from a rowset


Removing only contiguous/adjacent duplicate records from a rowset

Author
Message
praveen_vejandla
praveen_vejandla
SSC Veteran
SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)

Group: General Forum Members
Points: 249 Visits: 763
In SQL 2008, is there any way to remove only adjacent/contiguous duplicate records? Only if a record repeats immediately, then it should be deleted.

Ex:

id value name
-- ------ -------
1 10 test
2 5 prod
3 10 test
4 4 test
5 4 test
6 10 test

Only records with id 4,5 should be deleted.

Records with id 1,3,6 should be preserved because value, name combination ( 10, test ) is duplicate but not contiguous.

Is there any way to achieve this result using TSQL?
ThomasRushton
ThomasRushton
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: Moderators
Points: 1840 Visits: 2202
This appears to do what you want:

use tempdb
go

declare @testtable table (id int, value int, name char(4))

insert into @testtable values
(1, 10, 'test'), (2, 5, 'prod'), (3, 10, 'test'), (4, 4, 'test'), (5, 4, 'test'), (6, 10, 'test')

select * from @testtable

delete from @testtable
WHERE id in (
select t2.id from @testtable t1 INNER JOIN @testtable t2 ON t1.id = t2.id - 1 INNER JOIN @testtable t3 ON t2.id = t3.id-1
WHERE t1.name = t2.name and t2.name = t3.name)

select * from @testtable


Cadavre
Cadavre
SSCrazy
SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)

Group: General Forum Members
Points: 2596 Visits: 8437
ThomasRushton (10/24/2012)
This appears to do what you want:

use tempdb
go

declare @testtable table (id int, value int, name char(4))

insert into @testtable values
(1, 10, 'test'), (2, 5, 'prod'), (3, 10, 'test'), (4, 4, 'test'), (5, 4, 'test'), (6, 10, 'test')

select * from @testtable

delete from @testtable
WHERE id in (
select t2.id from @testtable t1 INNER JOIN @testtable t2 ON t1.id = t2.id - 1 INNER JOIN @testtable t3 ON t2.id = t3.id-1
WHERE t1.name = t2.name and t2.name = t3.name)

select * from @testtable



That's a lot of joins for a simple operation.

Maybe try something like this instead?
DELETE a 
FROM @testtable a
WHERE EXISTS (SELECT 1
FROM @testtable b
WHERE (a.id = b.id+1 OR a.id = b.id-1)
AND a.NAME = b.NAME AND a.value = b.value);




Forever trying to learn

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better, quicker answers on SQL Server performance related questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/



If you litter your database queries with nolock query hints, are you aware of the side effects?
Try reading a few of these links...

(*) Missing rows with nolock
(*) Allocation order scans with nolock
(*) Consistency issues with nolock
(*) Transient Corruption Errors in SQL Server error log caused by nolock
(*) Dirty reads, read errors, reading rows twice and missing rows with nolock


Craig Wilkinson - Software Engineer
LinkedIn
EamonSQL
EamonSQL
SSC-Enthusiastic
SSC-Enthusiastic (122 reputation)SSC-Enthusiastic (122 reputation)SSC-Enthusiastic (122 reputation)SSC-Enthusiastic (122 reputation)SSC-Enthusiastic (122 reputation)SSC-Enthusiastic (122 reputation)SSC-Enthusiastic (122 reputation)SSC-Enthusiastic (122 reputation)

Group: General Forum Members
Points: 122 Visits: 181
select from the table using the ROW_NUMBER() function ordering by id and partitioning by value.
Wrap in brackets giving you a derived table.

Run a delete command against the table and joining to the derived table on the id and where rownum > 1

That should sort you out.

Eamon
ChrisM@Work
ChrisM@Work
SSCrazy Eights
SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)SSCrazy Eights (9K reputation)

Group: General Forum Members
Points: 9005 Visits: 19028
EamonSQL (10/24/2012)
select from the table using the ROW_NUMBER() function ordering by id and partitioning by value.
Wrap in brackets giving you a derived table.

Run a delete command against the table and joining to the derived table on the id and where rownum > 1

That should sort you out.

Eamon


Don't you mean a COUNT(*), and deleting where COUNT(*) > 1?

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Cadavre
Cadavre
SSCrazy
SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)

Group: General Forum Members
Points: 2596 Visits: 8437
EamonSQL (10/24/2012)
select from the table using the ROW_NUMBER() function ordering by id and partitioning by value.
Wrap in brackets giving you a derived table.

Run a delete command against the table and joining to the derived table on the id and where rownum > 1

That should sort you out.

Eamon


The OP said: -
praveen_vejandla (10/24/2012)
id value name
-- ------ -------
1 10 test
2 5 prod
3 10 test
4 4 test
5 4 test
6 10 test

Only records with id 4,5 should be deleted.


Here's some sample data to play with: -
DECLARE @testtable TABLE (id INT, value INT, NAME CHAR(4));

INSERT INTO @testtable
VALUES (1, 10, 'test'), (2, 5, 'prod'), (3, 10, 'test'), (4, 4, 'test'), (5, 4, 'test'), (6, 10, 'test');



So the expected result after you have deleted the bad rows is: -
id          value       NAME
----------- ----------- ----
1 10 test
2 5 prod
3 10 test
6 10 test


So, let's code up what you just described: -
DELETE a
FROM @testtable a
INNER JOIN (SELECT id, value, NAME,
ROW_NUMBER() OVER(PARTITION BY value ORDER BY id) AS rownum
FROM @testtable) b ON a.id = b.id
WHERE b.rownum > 1;



OK, now we'll run it against the sample data and the result is: -
id          value       NAME
----------- ----------- ----
1 10 test
2 5 prod
4 4 test


Ah, it seems you've deleted a few extra rows and kept a row that we wanted to delete.

OK, let's take a look at ThomasRushton's answer: -
DELETE
FROM @testtable
WHERE id IN (SELECT t2.id
FROM @testtable t1
INNER JOIN @testtable t2 ON t1.id = t2.id - 1
INNER JOIN @testtable t3 ON t2.id = t3.id - 1
WHERE t1.NAME = t2.NAME AND t2.NAME = t3.NAME
);



For me, there are two issues here. The first is that we're touching the table 4 times, which is unnecessary. The second is that we're only looking at "name" to see if a record is a duplicate. (If the OP agrees with this as the requirement, then we can scratch my second issue).

What about the result? Does it give the correct answer from the sample data?
id          value       NAME
----------- ----------- ----
1 10 test
2 5 prod
3 10 test
6 10 test


Yes is does! Excellent.

Now, let's look at my answer.
DELETE a 
FROM @testtable a
WHERE EXISTS (SELECT 1
FROM @testtable b
WHERE (a.id = b.id+1 OR a.id = b.id-1)
AND a.NAME = b.NAME AND a.value = b.value);



This time we're only touching the table twice and we're looking at both name and value to determine a duplicate. But does it produce the correct result based on the sample data?
id          value       NAME
----------- ----------- ----
1 10 test
2 5 prod
3 10 test
6 10 test


Again, yes it does.



A question for the OP though - after the query has been run you'll notice that the result-set from the has a new "duplicate" (3 and 6). If you were to run either of the working solutions again, they wouldn't detect this as a duplicate. Would you expect them to? Or is this desired behaviour?


Forever trying to learn

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better, quicker answers on SQL Server performance related questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/



If you litter your database queries with nolock query hints, are you aware of the side effects?
Try reading a few of these links...

(*) Missing rows with nolock
(*) Allocation order scans with nolock
(*) Consistency issues with nolock
(*) Transient Corruption Errors in SQL Server error log caused by nolock
(*) Dirty reads, read errors, reading rows twice and missing rows with nolock


Craig Wilkinson - Software Engineer
LinkedIn
praveen_vejandla
praveen_vejandla
SSC Veteran
SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)

Group: General Forum Members
Points: 249 Visits: 763
Thanks every one for the prompt replies and sharing your thoughts.

id field given in the sample table is contiguous but in real data that I am working at, id field has gaps in between.

Also one issue with the answer that was provided by Thomas is:

sample data

id value name

1 10 test
2 5 prod
3 10 test
4 4 test
5 4 test
6 10 test


result

id value name

1 10 test
2 5 prod
3 10 test
6 10 test

It is removing all the records with value 4. Actually what is required is it should preserve the 1st value with min(id). It should not remove all the records.

Required result
-------------

id value name

1 10 test
2 5 prod
3 10 test
4 4 test
6 10 test
BriPan
BriPan
SSC Journeyman
SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)

Group: General Forum Members
Points: 85 Visits: 296
try this one

DECLARE @testtable TABLE (id INT, value INT, NAME CHAR(4));

INSERT INTO @testtable
VALUES
(1, 10, 'test'),
(2, 5, 'prod'),
(3, 10, 'test'),
(4, 4, 'test'),
(5, 4, 'test'),
(6, 4, 'test'),
(7, 10, 'test')

delete from @testtable where id in (
select t1.id
from
@testtable t
inner join @testtable t1 on t.id+1=t1.id and t.value=t1.value
)

select * from @testtable
Want a cool Sig
Want a cool Sig
Old Hand
Old Hand (318 reputation)Old Hand (318 reputation)Old Hand (318 reputation)Old Hand (318 reputation)Old Hand (318 reputation)Old Hand (318 reputation)Old Hand (318 reputation)Old Hand (318 reputation)

Group: General Forum Members
Points: 318 Visits: 705
This works for me. I've made a few assumptions as to the ID, value and name.
I've also modified the test data for assumption 1 and 2.

1. ID value is not always incremented by 1 there can be gaps.
2. Value does not always have the same name (i.e. 10 is not always "test")
3. We're looking for duplicates in Value and Test (i.e. composite unique key)
4. We keep the first instance of a duplicate and delete the subsequent ones.


DECLARE @testtable TABLE (id INT, value INT, NAME CHAR(4));

INSERT INTO @testtable
VALUES
(1, 10, 'test'), --Keep
(2, 5, 'prod'), --Keep
(4, 10, 'test'), --Keep
(6, 4, 'test'), --Keep
(7, 4, 'test'), --Drop
(9, 4, 'Job'), --Keep (matching value but not name)
(11, 10, 'test'), --Keep
(13, 10, 'test') --Drop
;
with t1 as (
select ROW_NUMBER()over(order by id) RowID
,ID
,Name
,Value
from @testtable
)
delete a
from @testtable a
where id in (
select t1.id
from t1
inner join t1 t2
on t1.RowID = t2.RowID + 1
and t1.value = t2.value
and t1.NAME = t2.NAME
)

select *
from @testtable



---------------------------------------------------------------
Mike Hahn - MCSomething someday:-)
Right way to ask for help!!
http://www.sqlservercentral.com/articles/Best+Practices/61537/
I post so I can see my avatar Hehe
I want a personal webpage Cool
I want to win the lotto :-D
I want a gf like Tiffa w00t Oh wait I'm married!:-D
praveen_vejandla
praveen_vejandla
SSC Veteran
SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)SSC Veteran (249 reputation)

Group: General Forum Members
Points: 249 Visits: 763
Thanks a lot for the replies. Finally, I could get the required result with your help.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search