Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Get Rid of Duplicates!


Get Rid of Duplicates!

Author
Message
seth delconte
seth delconte
SSCommitted
SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)

Group: General Forum Members
Points: 1681 Visits: 1360
Comments posted to this topic are about the item Get Rid of Duplicates!

_________________________________
seth delconte
http://sqlkeys.com
int.blue
int.blue
Forum Newbie
Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)

Group: General Forum Members
Points: 1 Visits: 53
I have the same issue on some tables that have an auto increment column which isn't an identity column, though lucky it sounds like it doesn't happen as often as it does for you! This is my technique which doesn't use a temp table, I use set rowcount set to 1 less than the number of duplicates and then call delete:

CREATE TABLE dbo.tblDupTest
(
id int not null,
)

INSERT INTO dbo.tblDupTest VALUES(1)
INSERT INTO dbo.tblDupTest VALUES(1)
INSERT INTO dbo.tblDupTest VALUES(2)
INSERT INTO dbo.tblDupTest VALUES(3)
INSERT INTO dbo.tblDupTest VALUES(4)
INSERT INTO dbo.tblDupTest VALUES(4)
INSERT INTO dbo.tblDupTest VALUES(4)

-- At this point we should have two 1s and three 4s
SELECT *
FROM dbo.tblDupTest

-- This will give us the counts
SELECT id,
COUNT(id) AS 'Count'
FROM dbo.tblDupTest
GROUP BY id
HAVING COUNT(id) > 1

-- Then set the rowcount to one less than the duplicate and call delete
set rowcount 1

DELETE FROM dbo.tblDupTest WHERE id = 1

set rowcount 2

DELETE FROM dbo.tblDupTest WHERE id = 4

set rowcount 0

SELECT *
FROM dbo.tblDupTest
JP de Jong-202059
JP de Jong-202059
SSC-Enthusiastic
SSC-Enthusiastic (128 reputation)SSC-Enthusiastic (128 reputation)SSC-Enthusiastic (128 reputation)SSC-Enthusiastic (128 reputation)SSC-Enthusiastic (128 reputation)SSC-Enthusiastic (128 reputation)SSC-Enthusiastic (128 reputation)SSC-Enthusiastic (128 reputation)

Group: General Forum Members
Points: 128 Visits: 172
Hi I prefer this syntax:

WITH ItemsToBeDeleted
AS
(
SELECT *
, row_number() over (partition by item_no ORDER BY id) as RowNumber
FROM item_store
)
DELETE FROM ItemsToBeDeleted Where RowNumber = 2

Much more efficient.
Regards,
JP
Trevor.weehuizen
Trevor.weehuizen
Forum Newbie
Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)

Group: General Forum Members
Points: 3 Visits: 93
dealing with duplicates on a daily basis, i find this approach works well

alter table withdupes add delid int identity(1,1)

delete x
from withdupes x
inner join (
select itemno, min(delid) as keepid
from withdupes
group by itemno
) y on x.itemo = y.itemno
where x.delid <> y.keepid

alter table withdupes drop column delid
Peter Pirker
Peter Pirker
Valued Member
Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)Valued Member (67 reputation)

Group: General Forum Members
Points: 67 Visits: 221
It's very neat and all, and maybe I misunderstand, but if the tables are identical, couldn't you avoid the whole duplicates issue by inserting into one table only, and let replication take care of the rest?

Regards

Peter
Jim C-203340
Jim C-203340
SSC-Addicted
SSC-Addicted (471 reputation)SSC-Addicted (471 reputation)SSC-Addicted (471 reputation)SSC-Addicted (471 reputation)SSC-Addicted (471 reputation)SSC-Addicted (471 reputation)SSC-Addicted (471 reputation)SSC-Addicted (471 reputation)

Group: General Forum Members
Points: 471 Visits: 200
The row_number() method is by far the quickest and cleanest method. If you've never used row_number() before, do yourself a favor and learn it.

one modification to JP's code.. "... where RowNumber > 1" will delete all duplicates not just in cases where you only have 1 dup.

JP de Jong-202059 (11/30/2009)
Hi I prefer this syntax:

WITH ItemsToBeDeleted
AS
(
SELECT *
, row_number() over (partition by item_no ORDER BY id) as RowNumber
FROM item_store
)
DELETE FROM ItemsToBeDeleted Where RowNumber = 2

Much more efficient.
Regards,
JP



Regards,

Jim C

RyanRandall
RyanRandall
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1807 Visits: 4652
There's an old article on it on this site...

http://www.sqlservercentral.com/articles/SQL+Server+2005+-+TSQL/dedupingdatainsqlserver2005/2260/

This option became available with SQL Server 2005.

Jim C-203340 (11/30/2009)
The row_number() method is by far the quickest and cleanest method. If you've never used row_number() before, do yourself a favor and learn it.

one modification to JP's code.. "... where RowNumber > 1" will delete all duplicates not just in cases where you only have 1 dup.

JP de Jong-202059 (11/30/2009)
Hi I prefer this syntax:

WITH ItemsToBeDeleted
AS
(
SELECT *
, row_number() over (partition by item_no ORDER BY id) as RowNumber
FROM item_store
)
DELETE FROM ItemsToBeDeleted Where RowNumber = 2

Much more efficient.
Regards,
JP


Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.
Logicalman1998
Logicalman1998
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 17
I have had a similar problem in the past, but rather than looking to clean up after the fact, I test for duplicates beforehand and eliminate the insert at that time.

I feel I have missed something in the original article as to why this might not have been identified as a design issue.
Dean Cochrane
Dean Cochrane
SSC-Enthusiastic
SSC-Enthusiastic (102 reputation)SSC-Enthusiastic (102 reputation)SSC-Enthusiastic (102 reputation)SSC-Enthusiastic (102 reputation)SSC-Enthusiastic (102 reputation)SSC-Enthusiastic (102 reputation)SSC-Enthusiastic (102 reputation)SSC-Enthusiastic (102 reputation)

Group: General Forum Members
Points: 102 Visits: 641
I think I'm with Tony Scott on this one: why not prevent the issue at insert time, rather than go through all the pain of removing duplicates after the fact?
tpoulsen
tpoulsen
Forum Newbie
Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)

Group: General Forum Members
Points: 3 Visits: 15
yes agree with this - if you are replicating two databases then just update one - other than that use unique constraints (and if need be triggers) to make sure that you always have a natural uniqueness on each row in your table.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search