Find and Remove Duplicate Records SQL Server

Question

Find and Remove Duplicate Records SQL Server

Ginger Daniel

Mr or Mrs. 500

Points: 525
More actions
January 31, 2016 at 9:41 pm

#307511

Comments posted to this topic are about the item Find and Remove Duplicate Records SQL Server

Viewing 15 posts - 1 through 15 (of 77 total)

You must be logged in to reply to this topic. Login to reply

namakakiweyho SSC Journeyman Points: 81 More actions · Answer 1

You could try this neat bit of code. Again you would have to define in your select statement what constitutes a duplicate;-

WITH tblTemp as

(

SELECT ROW_NUMBER() Over(PARTITION BY C.contactname ORDER BY C.contactname)

As RowNumber,* FROM Customers C

)

DELETE FROM tblTemp where RowNumber >1

andy_111 SSC Enthusiast Points: 102 More actions · Answer 2

Yes, it's beter variant. 'In' operation is very slow operation. Again, if in you have composite key in your table (for example contactname and contactoldname)? Also, database engine optimised for relational operations and not optimized for cyclic operations like WHILE.

namakakiweyho (1/31/2016)
You could try this neat bit of code. Again you would have to define in your select statement what constitutes a duplicate;-
WITH tblTemp as
(
SELECT ROW_NUMBER() Over(PARTITION BY C.contactname ORDER BY C.contactname)
As RowNumber,* FROM Customers C
)
DELETE FROM tblTemp where RowNumber >1

p.s. Enough select only RowNumber in CTE, there is not need select *.

andy_111 SSC Enthusiast Points: 102 More actions · Answer 3

andy_111

SSC Enthusiast

Points: 102

January 31, 2016 at 10:50 pm

#1855609

Not really useful article.

Daniel Matthee Mr or Mrs. 500 Points: 579 More actions · Answer 4

I have not tested but just by looking at the code will it not delete both the records???

There is nothing to indicate that you just want to get rid of all dups and only have a unique records left, or am I missing something here?

Regards

--------------------------------------------------------------------------------------------------------------------------------------------------------
To some it may look best but to others it might be the worst, but then again to some it might be the worst but to others the best.
http://www.sql-sa.co.za

andy_111 SSC Enthusiast Points: 102 More actions · Answer 5

Daniel Matthee (1/31/2016)
I have not tested but just by looking at the code will it not delete both the records???
There is nothing to indicate that you just want to get rid of all dups and only have a unique records left, or am I missing something here?
Regards

For example, you have table Customers (contactname VARCHAR(100) NOT NULL, Age INT)

You fill table

Insert into Customers (contactname, Age) VALUES ('Andrey',35),('Alexey', '40'),('Andrey',50);

Query

SELECT ROW_NUMBER() Over(PARTITION BY C.contactname ORDER BY C.contactname)

As RowNumber,C.* FROM Customers C

returns

1 Alexey 40

1 Andrey 35

2 Andrey 50

RowNumber partitioned inside contactname and it unique inside every contactname. If you delete RowNumber > 1 you delete duplicates.

andy_111 SSC Enthusiast Points: 102 More actions · Answer 6

If you want change order of deleting, you must set ORDER BY option in OVER clause. This option have influence on numbering order.

Daniel Matthee Mr or Mrs. 500 Points: 579 More actions · Answer 7

andy_111 (2/1/2016)
Daniel Matthee (1/31/2016)
I have not tested but just by looking at the code will it not delete both the records???
There is nothing to indicate that you just want to get rid of all dups and only have a unique records left, or am I missing something here?
Regards
For example, you have table Customers (contactname VARCHAR(100) NOT NULL, Age INT)
You fill table
Insert into Customers (contactname, Age) VALUES ('Andrey',35),('Alexey', '40'),('Andrey',50);
Query
SELECT ROW_NUMBER() Over(PARTITION BY C.contactname ORDER BY C.contactname)
As RowNumber,C.* FROM Customers C
returns
1 Alexey 40
1 Andrey 35
2 Andrey 50
RowNumber partitioned inside contactname and it unique inside every contactname. If you delete RowNumber > 1 you delete duplicates.

I Fully agree with you, I meant the original post. In the original for it will get rid of all records that has a dup.

--------------------------------------------------------------------------------------------------------------------------------------------------------
To some it may look best but to others it might be the worst, but then again to some it might be the worst but to others the best.
http://www.sql-sa.co.za

andy_111 SSC Enthusiast Points: 102 More actions · Answer 8

Yes, you are right. I think, for my sample it delete Andrey customer at all

Ness SSCarpal Tunnel Points: 4282 More actions · Answer 9

As long as there are not any triggers to complicate the issue, the merge statement would be another option. Not necessarily better, but at least you could update and insert records at the same time

SQL DBA
Every day is a school day, and don't trust anyone who tells you any different.
http://sqlblogness.blogspot.co.uk

Mauricio_ SSCrazy Points: 2935 More actions · Answer 10

It looks like an old fashion article, doesn't it? For SQL 2000 or something.

Adrian_1 SSC Veteran Points: 280 More actions · Answer 11

according to MSDN, "Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax. For more information, see TOP (Transact-SQL)."

https://msdn.microsoft.com/en-us/library/ms188774.aspx

this method may not be such a good idea!

There are lots of ways to achieve this, one is to use rank () over and delete where > 1

RobCarter SSC Enthusiast Points: 176 More actions · Answer 12

I have not tested but just by looking at the code will it not delete both the records???
There is nothing to indicate that you just want to get rid of all dups and only have a unique records left, or am I missing something here?

That's what I was thinking: Why would you want to delete both of the duplicates? No-one would, surely. It doesn't, as the OP ascertains, reinforce referential integrity. It potentially corrupts the database, especially if there are rows in other tables that refer to the perfectly sound one that was just deleted because we followed a post on here blindly.

In all seriousness, there will be people who are not DB professionals reading this by searching for duplicate removal on Google or something who come across this and, because of the reputation of the site in general, trust it implicitly.

The best solution I have seen is on here is to use the ROW_NUMBER() function and delete all records >1 grouped by the unique(!) identifier.

Adrian_1 SSC Veteran Points: 280 More actions · Answer 13

The OP is intending that only 1 row will be deleted (because SET ROWCOUNT = 1) and therefore on the 2nd pass will not delete because there is now only 1 left...

andy_111 SSC Enthusiast Points: 102 More actions · Answer 14

Adrian_1 (2/1/2016)
The OP is intending that only 1 row will be deleted (because SET ROWCOUNT = 1) and therefore on the 2nd pass will not delete because there is now only 1 left...

see previous post

according to MSDN, "Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax. For more information, see TOP (Transact-SQL)."

https://msdn.microsoft.com/en-us/library/ms188774.aspx

Find and Remove Duplicate Records SQL Server

Cookies on SQLServerCentral