Using OVER() to Remove Duplicate Rows

  • Comments posted to this topic are about the item Using OVER() to Remove Duplicate Rows

  • Hello Dear Mike,

    I think we can also remove duplicates using simplest, easiest and better way without using temp table/s logic:

    WITH CTE_InsurancePolicy AS (

    SELECT PolicyId, FirstName, Surname, PolicyStartDate, PolicyEndDate

    ,ROW_NUMBER() OVER (PARTITION BY PolicyId, FirstName, Surname, PolicyStartDate, PolicyEndDate ORDER BY PolicyId) AS RowNumber

    FROM dbo.InsurancePolicy

    )

    DELETE CTE_InsurancePolicy

    WHERE RowNumber > 1;

    Thanks

    Irfan Baig.

  • Hi Irfan

    Yes, your method is completely valid. The article is just trying to demonstrate how to use the OVER() clause, as there's a thought not enough people are aware of it and how useful it can be.

    I agree that there are several ways this problem can be handled, your code shows that. Thanks for taking the time to comment.

    Regards,

    Mike.

  • This is a very useful method to learn

    I also use this to check for PK violations when transforming staging records into live data

    Could you create a cte for this and delete from the cte?

    That way you don't require a temp table

    ~Sorry, see that point has already been made!

    - Damian

  • I have been spooked by OVER() and PARTITION for too long now. You made it EZ for me - especially by using a VERY real world example. I have encountered your example so many darn times in the past. I really could have used this solution to make my life easier. Thanks again!

  • Hi Mike,

    Glad it was useful!

    Regards,

    Mike.

  • Sorry. i don't see how you can delete from the cte and still have the data that you don't want to delete in the true table.

  • Hi Shaulbel

    Deleting from the CTE will delete the data from the underlying table. This link should explain more:

    http://stackoverflow.com/questions/6010233/delete-rows-from-cte-in-sql-server

    I prefer using temp tables so you can double-check what you're deleting before you actually delete it, but there are plenty of ways of achieving the same goal.

    Thanks for reading,

    Mike.

  • Thanks.

    Learned new thing.

    Wasn't aware of delete CTE.

  • I've used OVER() a few times when I copied something from a forum to solve a problem, and hacked it until it gave me the results I needed. I used it, but I never really was clear on just how it worked. Thanks for a good example.

    BTW, for this particular problem, another solution would be to GROUP BY all fields into the temp table, giving exactly one copy of each row, clear the main and copy the temp back.

  • Irfan.Baig (11/7/2016)


    I think we can also remove duplicates using simplest, easiest and better way without using temp table/s logic:

    I think the simplest in this case would be to truncate the table and re-run the import process.

    😉

  • I've had to use similar methods to do this exact same thing before. Nice real world example Mike. There are multiple ways to do this without using the OVER() clause but this is a nice simple example of how windowing functions work. Thanks for the write-up.

  • Thanks Ken, glad you liked it!

    Cheers,

    Mike.

  • what about select distinct into ?

    steps:

    1) Select distinct into a temporal table

    2) truncate original table

    3) Insert into original table select from temporal table

    It seems better

  • There may be any number of 'better' ways to accomplish this particular task, but the post was intended as a tutorial about how OVER() works, and an example of where it can be applied. I think it served admirably for that. I certainly learned something.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply