Duplicate Entries

  • Hi Guys, long time reader, first time poster πŸ™‚

    I have a question regarding duplicate entries.

    I have a basic table with 7 columns that I am having to more into a new table with a unique index on 4 of the 7 columns.

    I have around 50,000 rows that due to the new index are considered 'duplicate'.

    My question is what would be the easiest way to either move the unique rows to a new table?

    Thanks

    Dean

  • This should get you started, Dean:

    DROP TABLE #MySampleTable

    CREATE TABLE #MySampleTable (col1 INT, col2 INT, col3 INT, col4 INT, col5 INT, col6 INT, col7 INT)

    INSERT INTO #MySampleTable (col1, col2, col3, col4, col5, col6, col7)

    SELECT 1, 2, 3, 4, 1, 1, 1 UNION ALL

    SELECT 1, 2, 3, 4, 2, 2, 2 UNION ALL

    SELECT 1, 2, 3, 4, 3, 3, 3 UNION ALL

    SELECT 1, 2, 3, 5, 1, 1, 1 UNION ALL

    SELECT 1, 2, 3, 5, 2, 2, 2 UNION ALL

    SELECT 1, 2, 3, 6, 1, 1, 1 UNION ALL

    SELECT 1, 2, 3, 6, 2, 2, 2 UNION ALL

    SELECT 1, 2, 3, 7, 1, 1, 1

    SELECT *

    INTO #MyNewTable

    FROM (

    SELECT *, MyRow = ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY col1, col2, col3, col4, col5, col6, col7)

    FROM #MySampleTable

    ) d

    WHERE MyRow = 1

    SELECT * FROM #MyNewTable

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • It would be most helpful for you to post :

    1. Table(s) schema.

    2. Sample data.

    Thanks.

  • Thanks for the replies guys.

    I will get the schema and sample data tomorrow and add to this forum.

    Cheers.

    Dean

  • Something for you to think about.

    If two rows have duplicate values for those 4 columns, but different values for the other three, which row would you want to keep and which would you want to discard?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thats one of the issues πŸ™‚

    One of the columns is a time stamp so I would want to keep the latest entry.

    Basically this is a WorkflowStatus table that has status code, then four column relating to the data itself and finally a 'Time' and a 'UpdateBy' column which records the users name.

    Dean.

  • deanrjohnson (12/15/2009)


    Thats one of the issues πŸ™‚

    Actually doing it is not hard, if there's a simple rule as to which row to keep. I'm sure once the schema's posted there'll be a solution within the hour.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply