Finding duplicate rows with multiple fields for comparison

  • I've taken a look at the scripts and forum posts on the site regarding finding duplicate rows, but I haven't been able to adapt them to my situation. I have data being pulled from Oracle and placed in a table in my sql server db. The table consists of a primary key identity field, a field containing the oracle ID, and 4 fields containing data. I need to find all duplicates, with a duplicate being defined as a match across all 4 data fields.

    I've tried queries along the following lines, but they don't work.

    SELECTMAX(id), MAX(oracle_id), field1, field2, field3, field4

    FROMoracleRecords

    GROUP BYfield1, field2, field3, field4

    HAVING COUNT(*) > 1

    If I understand how this is supposed to work, only duplicate rows should be returned. Is this the case? I've gotten it to work when comparing only one field, but when I try to compare more than one I break it. Any suggestions?

  • You might want to try this sort of query instead

    SELECTid, oracle_id, field1, field2, field3, field4,

    COUNT(*) OVER(PARTITION BY field1, field2, field3, field4) AS NumberOfDuplicates

    FROMoracleRecords

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • That worked perfectly. Thanks!

  • One further question: Is it possible to identify the newest record of each group and delete the others automatically? One of the fields is a datestamp, but I don't understand how to isolate the newest records.

    For example.

    Field1 Field2 Field3 Field4

    AAA AAA AAA 18 Dec 2012 <- delete

    AAA AAA AAA 19 Dec 2012

    BBB BBB BBB 1 Jan 2013 <- delete

    BBB BBB BBB 3 Jan 2013 <- delete

    BBB BBB BBB 6 Jan 2013

    Is this possible through tsql?

  • WITH CTE AS (

    SELECTid, oracle_id, field1, field2, field3, field4,

    ROW_NUMBER() OVER(PARTITION BY field1, field2, field3 ORDER BY field4 DESC) AS rn

    FROMoracleRecords)

    DELETE FROM CTE

    WHERE rn > 1;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply