Find and Remove Duplicate Records SQL Server

  • andy_111 (2/1/2016)


    Adrian_1 (2/1/2016)


    The OP is intending that only 1 row will be deleted (because SET ROWCOUNT = 1) and therefore on the 2nd pass will not delete because there is now only 1 left...

    see previous post

    according to MSDN, "Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax. For more information, see TOP (Transact-SQL)."

    https://msdn.microsoft.com/en-us/library/ms188774.aspx%5B/quote%5D

    Based on the nested comment if SET ROWCOUNT = 1 and you have a 10k dups. All that I will say is good luck!:-P.

    --------------------------------------------------------------------------------------------------------------------------------------------------------
    To some it may look best but to others it might be the worst, but then again to some it might be the worst but to others the best.
    http://www.sql-sa.co.za

  • I agree with the general forum sentiment - rowcount being an older, deprecated method, should obviously be avoided. I agree the windowing methods are more appropriate. I am glad it was posted though, from a general trivia perspective. Without noticing its role, the delete query would lead one to assume an incorrect number of records would be removed. :w00t:

  • It is a weird solution but I am glad I have seen a new solution to remove duplication after 10 years(after introduction of row_number in SQL 2005) 😀

  • Even the method posted on Microsoft Support is better than this ugly RBAR option with a deprecated feature.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Nice post! I have a problem looking for duplicates in a similar table with "text" fields. What would that SQL look like?

    Thank you,

    Bob

  • Wouldn't this be a simpler solution and be a bit less confusing?

    delete Customers

    where CustID in (

    select max(CustID) -- or change to min() if wanting to keep latest

    from Customers

    group by CustName

    having count(*) > 1

    )

  • vopipari (2/1/2016)


    Wouldn't this be a simpler solution and be a bit less confusing?

    delete Customers

    where CustID in (

    select max(CustID) -- or change to min() if wanting to keep latest

    from Customers

    group by CustName

    having count(*) > 1

    )

    This will still remove ALL customer records that have a duplicate record

    --------------------------------------------------------------------------------------------------------------------------------------------------------
    To some it may look best but to others it might be the worst, but then again to some it might be the worst but to others the best.
    http://www.sql-sa.co.za

  • It only picks the min or max CustID for the pair of duplicates.

  • Microsoft said "Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax."

  • vopipari (2/1/2016)


    It only picks the min or max CustID for the pair of duplicates.

    😛

  • but it won't tidy up triplicates....

  • Adrian_1 (2/1/2016)


    but it won't tidy up triplicates....

    True, I was only thinking of duplicates. This fixes that though:

    delete Customers

    where CustName in (

    select CustName

    from Customers

    group by CustName

    having count(*) > 1

    )

    and CustID not in (

    select max(CustID) -- keeps latest, change to min() if wanting to keep the first

    from Customers

    group by CustName

    having count(*) > 1

    );

  • bob_chang (2/1/2016)


    Nice post! I have a problem looking for duplicates in a similar table with "text" fields. What would that SQL look like?

    Thank you,

    Bob

    The same... The answers given here are all generic. It doesn't matter what data types you are dealing with.

  • Personally I prefer

    WITH cte AS (

    SELECT a.*,

    ROW_NUMBER() OVER(PARTITION BY field1,field2 ORDER BY field1,field2) rrn

    FROM dbo.file_with_duplicates a)

    DELETE FROM cte WHERE rrn > 1

  • JustASQLGuy (2/1/2016)


    Wouldn't this be a simpler solution and be a bit less confusing?

    delete Customers

    where CustID in (

    select max(CustID) -- or change to min() if wanting to keep latest

    from Customers

    group by CustName

    having count(*) > 1

    )

    Once I read the post the issue of deleting ALL records hit me because I have been in this kind of situation before. Take care to backup the table as an exported copy before trying out on production. The above query by JustASQLGuy comes closer to a solution and if run multiple time in a WHILE loop could clear all entries occuring multiple times. I haven't tested this theory though.

    Br. Kenneth Igiri
    https://kennethigiri.com
    All nations come to my light, all kings to the brightness of my rising

Viewing 15 posts - 16 through 30 (of 76 total)

You must be logged in to reply to this topic. Login to reply