Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Get Rid of Duplicates! Expand / Collapse
Author
Message
Posted Saturday, November 28, 2009 11:19 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, September 16, 2014 12:10 PM
Points: 1,619, Visits: 1,236
Comments posted to this topic are about the item Get Rid of Duplicates!

_________________________________
seth delconte
http://sqlkeys.com
Post #825835
Posted Monday, November 30, 2009 2:05 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, September 13, 2013 1:47 AM
Points: 1, Visits: 53
I have the same issue on some tables that have an auto increment column which isn't an identity column, though lucky it sounds like it doesn't happen as often as it does for you! This is my technique which doesn't use a temp table, I use set rowcount set to 1 less than the number of duplicates and then call delete:

CREATE TABLE dbo.tblDupTest
(
id int not null,
)

INSERT INTO dbo.tblDupTest VALUES(1)
INSERT INTO dbo.tblDupTest VALUES(1)
INSERT INTO dbo.tblDupTest VALUES(2)
INSERT INTO dbo.tblDupTest VALUES(3)
INSERT INTO dbo.tblDupTest VALUES(4)
INSERT INTO dbo.tblDupTest VALUES(4)
INSERT INTO dbo.tblDupTest VALUES(4)

-- At this point we should have two 1s and three 4s
SELECT *
FROM dbo.tblDupTest

-- This will give us the counts
SELECT id,
COUNT(id) AS 'Count'
FROM dbo.tblDupTest
GROUP BY id
HAVING COUNT(id) > 1

-- Then set the rowcount to one less than the duplicate and call delete
set rowcount 1

DELETE FROM dbo.tblDupTest WHERE id = 1

set rowcount 2

DELETE FROM dbo.tblDupTest WHERE id = 4

set rowcount 0

SELECT *
FROM dbo.tblDupTest
Post #826048
Posted Monday, November 30, 2009 2:25 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, February 25, 2014 1:05 AM
Points: 122, Visits: 167
Hi I prefer this syntax:

WITH ItemsToBeDeleted
AS
(
SELECT *
, row_number() over (partition by item_no ORDER BY id) as RowNumber
FROM item_store
)
DELETE FROM ItemsToBeDeleted Where RowNumber = 2

Much more efficient.
Regards,
JP
Post #826061
Posted Monday, November 30, 2009 4:38 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, December 18, 2014 8:41 AM
Points: 3, Visits: 79
dealing with duplicates on a daily basis, i find this approach works well

alter table withdupes add delid int identity(1,1)

delete x
from withdupes x
inner join (
select itemno, min(delid) as keepid
from withdupes
group by itemno
) y on x.itemo = y.itemno
where x.delid <> y.keepid

alter table withdupes drop column delid
Post #826116
Posted Monday, November 30, 2009 5:24 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, January 2, 2014 7:30 AM
Points: 37, Visits: 220
It's very neat and all, and maybe I misunderstand, but if the tables are identical, couldn't you avoid the whole duplicates issue by inserting into one table only, and let replication take care of the rest?

Regards

Peter
Post #826131
Posted Monday, November 30, 2009 6:22 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Monday, October 6, 2014 12:06 PM
Points: 466, Visits: 176
The row_number() method is by far the quickest and cleanest method. If you've never used row_number() before, do yourself a favor and learn it.

one modification to JP's code.. "... where RowNumber > 1" will delete all duplicates not just in cases where you only have 1 dup.

JP de Jong-202059 (11/30/2009)
Hi I prefer this syntax:

WITH ItemsToBeDeleted
AS
(
SELECT *
, row_number() over (partition by item_no ORDER BY id) as RowNumber
FROM item_store
)
DELETE FROM ItemsToBeDeleted Where RowNumber = 2

Much more efficient.
Regards,
JP



Regards,

Jim C

Post #826147
Posted Monday, November 30, 2009 6:42 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, May 29, 2012 11:22 AM
Points: 1,755, Visits: 4,652
There's an old article on it on this site...

http://www.sqlservercentral.com/articles/SQL+Server+2005+-+TSQL/dedupingdatainsqlserver2005/2260/

This option became available with SQL Server 2005.

Jim C-203340 (11/30/2009)
The row_number() method is by far the quickest and cleanest method. If you've never used row_number() before, do yourself a favor and learn it.

one modification to JP's code.. "... where RowNumber > 1" will delete all duplicates not just in cases where you only have 1 dup.

JP de Jong-202059 (11/30/2009)
Hi I prefer this syntax:

WITH ItemsToBeDeleted
AS
(
SELECT *
, row_number() over (partition by item_no ORDER BY id) as RowNumber
FROM item_store
)
DELETE FROM ItemsToBeDeleted Where RowNumber = 2

Much more efficient.
Regards,
JP



Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.
Post #826157
Posted Monday, November 30, 2009 6:51 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, September 28, 2010 10:12 AM
Points: 5, Visits: 17

I have had a similar problem in the past, but rather than looking to clean up after the fact, I test for duplicates beforehand and eliminate the insert at that time.

I feel I have missed something in the original article as to why this might not have been identified as a design issue.
Post #826164
Posted Monday, November 30, 2009 7:03 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Saturday, May 31, 2014 10:37 PM
Points: 85, Visits: 625
I think I'm with Tony Scott on this one: why not prevent the issue at insert time, rather than go through all the pain of removing duplicates after the fact?
Post #826174
Posted Monday, November 30, 2009 7:10 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, December 3, 2009 12:05 PM
Points: 3, Visits: 15
yes agree with this - if you are replicating two databases then just update one - other than that use unique constraints (and if need be triggers) to make sure that you always have a natural uniqueness on each row in your table.
Post #826180
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse