duplicate record problem

  • This is pretty easy for most of you, but i need a little help if i can get it.

    I have a table with duplicate records. no one column will tell me if my record is a duplicate.

    It takes comparing 4 different columns in the table to distinguish which records are duplicates

    I am trying to put a unique key on it for the 4 columns, but need to remove the duplicates first.

    how can i create a query to do this. I know its possible, i just am not sure how to do it.

    This is basically what my data looks like:

    column1   column2   column3   column4

    37             aa           bb           cc

    36             aa           bb            cc

    37             aa           bc            cc

    36              aa           bb            cd

    This is my theory on how it would work.

    delete from table where (rowid)

        not in

         (select min(rowid), column1, column2, column3, column4 from table group by column1, column2, column3, column4)

     

  • This article should be helpful

    http://www.sqlservercentral.com/columnists/sjones/removeduplicate.asp

     

    it is also possible to remove duplicates rows using CTEs in sql server 2005

    WITH

    Dups AS

    (

    SELECT *,

    ROW_NUMBER

    () OVER(PARTITION BY col1, col2, col3, col4 ORDER BY col1, col2, col3, col4) AS rn

    FROM dbo.Employees

    )

    delete

    from dups where rn > 1;

    hth

    David

  • The only problem with your DELETE query is the column list in the subquery.  It would work like this:

    delete from table where (rowid)

        not in

         (select min(rowid) from table group by column1, column2, column3, column4)

    Not that there's anything wrong with the CTE version, mind you.

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply