Find and Remove Duplicate Records SQL Server

  • Ginger Keys Daniel

    SSC-Addicted

    Points: 413

    Comments posted to this topic are about the item Find and Remove Duplicate Records SQL Server

  • namakakiweyho

    SSC Journeyman

    Points: 81

    You could try this neat bit of code. Again you would have to define in your select statement what constitutes a duplicate;-

    WITH tblTemp as

    (

    SELECT ROW_NUMBER() Over(PARTITION BY C.contactname ORDER BY C.contactname)

    As RowNumber,* FROM Customers C

    )

    DELETE FROM tblTemp where RowNumber >1

  • andy_111

    SSC Enthusiast

    Points: 102

    Yes, it's beter variant. 'In' operation is very slow operation. Again, if in you have composite key in your table (for example contactname and contactoldname)? Also, database engine optimised for relational operations and not optimized for cyclic operations like WHILE.

    namakakiweyho (1/31/2016)


    You could try this neat bit of code. Again you would have to define in your select statement what constitutes a duplicate;-

    WITH tblTemp as

    (

    SELECT ROW_NUMBER() Over(PARTITION BY C.contactname ORDER BY C.contactname)

    As RowNumber,* FROM Customers C

    )

    DELETE FROM tblTemp where RowNumber >1

    p.s. Enough select only RowNumber in CTE, there is not need select *.

  • andy_111

    SSC Enthusiast

    Points: 102

    Not really useful article.

  • Daniel Matthee

    Mr or Mrs. 500

    Points: 569

    I have not tested but just by looking at the code will it not delete both the records???

    There is nothing to indicate that you just want to get rid of all dups and only have a unique records left, or am I missing something here?

    Regards

    --------------------------------------------------------------------------------------------------------------------------------------------------------
    To some it may look best but to others it might be the worst, but then again to some it might be the worst but to others the best.
    http://www.sql-sa.co.za

  • andy_111

    SSC Enthusiast

    Points: 102

    Daniel Matthee (1/31/2016)


    I have not tested but just by looking at the code will it not delete both the records???

    There is nothing to indicate that you just want to get rid of all dups and only have a unique records left, or am I missing something here?

    Regards

    For example, you have table Customers (contactname VARCHAR(100) NOT NULL, Age INT)

    You fill table

    Insert into Customers (contactname, Age) VALUES ('Andrey',35),('Alexey', '40'),('Andrey',50);

    Query

    SELECT ROW_NUMBER() Over(PARTITION BY C.contactname ORDER BY C.contactname)

    As RowNumber,C.* FROM Customers C

    returns

    1 Alexey 40

    1 Andrey 35

    2 Andrey 50

    RowNumber partitioned inside contactname and it unique inside every contactname. If you delete RowNumber > 1 you delete duplicates.

  • andy_111

    SSC Enthusiast

    Points: 102

    If you want change order of deleting, you must set ORDER BY option in OVER clause. This option have influence on numbering order.

  • Daniel Matthee

    Mr or Mrs. 500

    Points: 569

    andy_111 (2/1/2016)


    Daniel Matthee (1/31/2016)


    I have not tested but just by looking at the code will it not delete both the records???

    There is nothing to indicate that you just want to get rid of all dups and only have a unique records left, or am I missing something here?

    Regards

    For example, you have table Customers (contactname VARCHAR(100) NOT NULL, Age INT)

    You fill table

    Insert into Customers (contactname, Age) VALUES ('Andrey',35),('Alexey', '40'),('Andrey',50);

    Query

    SELECT ROW_NUMBER() Over(PARTITION BY C.contactname ORDER BY C.contactname)

    As RowNumber,C.* FROM Customers C

    returns

    1 Alexey 40

    1 Andrey 35

    2 Andrey 50

    RowNumber partitioned inside contactname and it unique inside every contactname. If you delete RowNumber > 1 you delete duplicates.

    I Fully agree with you, I meant the original post. In the original for it will get rid of all records that has a dup.

    --------------------------------------------------------------------------------------------------------------------------------------------------------
    To some it may look best but to others it might be the worst, but then again to some it might be the worst but to others the best.
    http://www.sql-sa.co.za

  • andy_111

    SSC Enthusiast

    Points: 102

    Yes, you are right. I think, for my sample it delete Andrey customer at all 🙂

  • Ness

    SSCarpal Tunnel

    Points: 4282

    As long as there are not any triggers to complicate the issue, the merge statement would be another option. Not necessarily better, but at least you could update and insert records at the same time

    SQL DBA
    Every day is a school day, and don't trust anyone who tells you any different.
    http://sqlblogness.blogspot.co.uk

  • Mauricio_

    SSCrazy

    Points: 2811

    It looks like an old fashion article, doesn't it? For SQL 2000 or something.

  • Adrian_1

    SSC Veteran

    Points: 280

    according to MSDN, "Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax. For more information, see TOP (Transact-SQL)."

    https://msdn.microsoft.com/en-us/library/ms188774.aspx

    this method may not be such a good idea!

    There are lots of ways to achieve this, one is to use rank () over and delete where > 1

  • RobCarter

    SSC Enthusiast

    Points: 176

    I have not tested but just by looking at the code will it not delete both the records???

    There is nothing to indicate that you just want to get rid of all dups and only have a unique records left, or am I missing something here?

    That's what I was thinking: Why would you want to delete both of the duplicates? No-one would, surely. It doesn't, as the OP ascertains, reinforce referential integrity. It potentially corrupts the database, especially if there are rows in other tables that refer to the perfectly sound one that was just deleted because we followed a post on here blindly.

    In all seriousness, there will be people who are not DB professionals reading this by searching for duplicate removal on Google or something who come across this and, because of the reputation of the site in general, trust it implicitly.

    The best solution I have seen is on here is to use the ROW_NUMBER() function and delete all records >1 grouped by the unique(!) identifier.

  • Adrian_1

    SSC Veteran

    Points: 280

    The OP is intending that only 1 row will be deleted (because SET ROWCOUNT = 1) and therefore on the 2nd pass will not delete because there is now only 1 left...

  • andy_111

    SSC Enthusiast

    Points: 102

    Adrian_1 (2/1/2016)


    The OP is intending that only 1 row will be deleted (because SET ROWCOUNT = 1) and therefore on the 2nd pass will not delete because there is now only 1 left...

    see previous post

    according to MSDN, "Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax. For more information, see TOP (Transact-SQL)."

    https://msdn.microsoft.com/en-us/library/ms188774.aspx

Viewing 15 posts - 1 through 15 (of 77 total)

You must be logged in to reply to this topic. Login to reply