Eliminating Duplicate Rows using The PARTITION BY clause

  • I was able to use this approach to identify and delete duplicates very easily. I just had my FROM clause in a form of another query that just gets all duplicate rows - very limited output.

    I think with careful consideration, the approach is very helpful.

    Stan

  • Wow, what a great article - it was really clear to understand & helped introduce some new ideas to a TSQL newbie like myself. The comments on the article are also really useful, explaining a few different ways of doing the same thing but also highlighting how creative you can be with your code and also how different people approach the same problem in different ways.

    I've learned something new & useful today - yay!

    Doodles 😀

  • Nicely written. Thank you for your contribution!

  • I agree with those suggesting using a CTE, as I did in my old blog post: How do I remove duplicate tuples from a relation without any candidate keys?

    Now, it is seldom a good idea not to have a table without any candidate keys 🙂



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • David Lean (9/21/2010)


    Nice article & Nice solution. But I would caution anyone about to blindly using this in production. It is only the 1st part of the process.

    Problem 1: need to Fix Declarative Referential Integrity (DRI) for the rows deleted.

    Invariably the duplicate rows you are about to remove will be referenced by the foreign keys in another table. When you delete these rows you must keep a copy of the Keys of the rows you deleted & map them to the key of the “duplicate” row you kept. That way you have some lookup table you can use to correct all the foreign key references.

    If a foreign key referenced only one of the rows you are deleting, isn't that by definition not a duplicate? So then in that case can't you just delete with the unique key?

  • If a foreign key referenced only one of the rows you are deleting, isn't that by definition not a duplicate? So then in that case can't you just delete with the unique key?

    He is saying if the table has a primary key and many columns but the data is already screwed up. Now they identify four other columns combination as unique but data shows duplicates by these four. (The four columns should have had a design of unique key to begin with)

  • Hi,

    The output of my query contains duplicate records (i.e. column1 value is similar but other column's value different) as well as single record having column2 value blanks(-). now I want to select only one record out of duplicate records for which column2 values are present (eliminating records for which column2 is blanks(-) from duplicate records) and all other single records having column2 values blanks.

    My query's Present Output

    column1 column2 column3

    abc xyz pqr

    abc - klm

    ijk - uvw

    plo - ujn

    yhk ttg wea

    yhk - erf

    Expected Output

    column1 column2 column3

    abc xyz pqr

    ijk - uvw

    plo - ujn

    yhk ttg wea

    Can someone help me. Thanks in advance.

  • prakashp 84206 (6/1/2012)


    Hi,

    The output of my query contains duplicate records (i.e. column1 value is similar but other column's value different) as well as single record having column2 value blanks(-). now I want to select only one record out of duplicate records for which column2 values are present (eliminating records for which column2 is blanks(-) from duplicate records) and all other single records having column2 values blanks.

    My query's Present Output

    column1 column2 column3

    abc xyz pqr

    abc - klm

    ijk - uvw

    plo - ujn

    yhk ttg wea

    yhk - erf

    Expected Output

    column1 column2 column3

    abc xyz pqr

    ijk - uvw

    plo - ujn

    yhk ttg wea

    Can someone help me. Thanks in advance.

    Welcome aboard.

    Read the link in my sig, this will assist you in describing your problem and encouraging others to help. Then start a new thread. It's likely that your problem is sufficiently different to the original on this thread that it's better separated from it.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 8 posts - 46 through 52 (of 52 total)

You must be logged in to reply to this topic. Login to reply