Find and Remove Duplicate Records SQL Server

  • Comments posted to this topic are about the item Find and Remove Duplicate Records SQL Server

  • You could try this neat bit of code. Again you would have to define in your select statement what constitutes a duplicate;-

    WITH tblTemp as

    (

    SELECT ROW_NUMBER() Over(PARTITION BY C.contactname ORDER BY C.contactname)

    As RowNumber,* FROM Customers C

    )

    DELETE FROM tblTemp where RowNumber >1

  • Yes, it's beter variant. 'In' operation is very slow operation. Again, if in you have composite key in your table (for example contactname and contactoldname)? Also, database engine optimised for relational operations and not optimized for cyclic operations like WHILE.

    namakakiweyho (1/31/2016)


    You could try this neat bit of code. Again you would have to define in your select statement what constitutes a duplicate;-

    WITH tblTemp as

    (

    SELECT ROW_NUMBER() Over(PARTITION BY C.contactname ORDER BY C.contactname)

    As RowNumber,* FROM Customers C

    )

    DELETE FROM tblTemp where RowNumber >1

    p.s. Enough select only RowNumber in CTE, there is not need select *.

  • Not really useful article.

  • I have not tested but just by looking at the code will it not delete both the records???

    There is nothing to indicate that you just want to get rid of all dups and only have a unique records left, or am I missing something here?

    Regards

    --------------------------------------------------------------------------------------------------------------------------------------------------------
    To some it may look best but to others it might be the worst, but then again to some it might be the worst but to others the best.
    http://www.sql-sa.co.za

  • Daniel Matthee (1/31/2016)


    I have not tested but just by looking at the code will it not delete both the records???

    There is nothing to indicate that you just want to get rid of all dups and only have a unique records left, or am I missing something here?

    Regards

    For example, you have table Customers (contactname VARCHAR(100) NOT NULL, Age INT)

    You fill table

    Insert into Customers (contactname, Age) VALUES ('Andrey',35),('Alexey', '40'),('Andrey',50);

    Query

    SELECT ROW_NUMBER() Over(PARTITION BY C.contactname ORDER BY C.contactname)

    As RowNumber,C.* FROM Customers C

    returns

    1 Alexey 40

    1 Andrey 35

    2 Andrey 50

    RowNumber partitioned inside contactname and it unique inside every contactname. If you delete RowNumber > 1 you delete duplicates.

  • If you want change order of deleting, you must set ORDER BY option in OVER clause. This option have influence on numbering order.

  • andy_111 (2/1/2016)


    Daniel Matthee (1/31/2016)


    I have not tested but just by looking at the code will it not delete both the records???

    There is nothing to indicate that you just want to get rid of all dups and only have a unique records left, or am I missing something here?

    Regards

    For example, you have table Customers (contactname VARCHAR(100) NOT NULL, Age INT)

    You fill table

    Insert into Customers (contactname, Age) VALUES ('Andrey',35),('Alexey', '40'),('Andrey',50);

    Query

    SELECT ROW_NUMBER() Over(PARTITION BY C.contactname ORDER BY C.contactname)

    As RowNumber,C.* FROM Customers C

    returns

    1 Alexey 40

    1 Andrey 35

    2 Andrey 50

    RowNumber partitioned inside contactname and it unique inside every contactname. If you delete RowNumber > 1 you delete duplicates.

    I Fully agree with you, I meant the original post. In the original for it will get rid of all records that has a dup.

    --------------------------------------------------------------------------------------------------------------------------------------------------------
    To some it may look best but to others it might be the worst, but then again to some it might be the worst but to others the best.
    http://www.sql-sa.co.za

  • Yes, you are right. I think, for my sample it delete Andrey customer at all 🙂

  • As long as there are not any triggers to complicate the issue, the merge statement would be another option. Not necessarily better, but at least you could update and insert records at the same time

    SQL DBA
    Every day is a school day, and don't trust anyone who tells you any different.
    http://sqlblogness.blogspot.co.uk

  • It looks like an old fashion article, doesn't it? For SQL 2000 or something.

  • according to MSDN, "Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax. For more information, see TOP (Transact-SQL)."

    https://msdn.microsoft.com/en-us/library/ms188774.aspx

    this method may not be such a good idea!

    There are lots of ways to achieve this, one is to use rank () over and delete where > 1

  • I have not tested but just by looking at the code will it not delete both the records???

    There is nothing to indicate that you just want to get rid of all dups and only have a unique records left, or am I missing something here?

    That's what I was thinking: Why would you want to delete both of the duplicates? No-one would, surely. It doesn't, as the OP ascertains, reinforce referential integrity. It potentially corrupts the database, especially if there are rows in other tables that refer to the perfectly sound one that was just deleted because we followed a post on here blindly.

    In all seriousness, there will be people who are not DB professionals reading this by searching for duplicate removal on Google or something who come across this and, because of the reputation of the site in general, trust it implicitly.

    The best solution I have seen is on here is to use the ROW_NUMBER() function and delete all records >1 grouped by the unique(!) identifier.

  • The OP is intending that only 1 row will be deleted (because SET ROWCOUNT = 1) and therefore on the 2nd pass will not delete because there is now only 1 left...

  • Adrian_1 (2/1/2016)


    The OP is intending that only 1 row will be deleted (because SET ROWCOUNT = 1) and therefore on the 2nd pass will not delete because there is now only 1 left...

    see previous post

    according to MSDN, "Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax. For more information, see TOP (Transact-SQL)."

    https://msdn.microsoft.com/en-us/library/ms188774.aspx

Viewing 15 posts - 1 through 15 (of 76 total)

You must be logged in to reply to this topic. Login to reply