Dynamic way to find duplicates

  • I usually come to this forum in a mad panic, but today I come bearing gifts. Thanks to an extremely tense data release and Suresh Maganti's article about duplicate rows (http://www.sqlservercentral.com/articles/T-SQL/70807/) I decided to give a go to creating a procedure to looks for duplicates.

    Questions are: can anybody see a way to improve/enhance this, any problems, etc. So far in my testing it's been good. Also, and most importantly, would anyone recommend ever creating a delete_dups procedure? The idea makes me a little nervous, but it would be so handy.

    Thanks for the help, Amy

    Here is the code:

    IF OBJECT_ID('dynamic_dup_finder') is not null

    DROP PROCEDURE dynamic_dup_finder;

    GO

    CREATE PROCEDURE dynamic_dup_finder

    (@Table varchar(50))

    AS

    DECLARE @sql varchar(max), @columns varchar(max)

    SELECT @columns = ISNULL(@columns + ', ', '') + '[' + name + ']'

    FROM

    (

    SELECT c.name

    FROM sys.tables t

    INNER JOIN sys.columns c

    ON t.object_id = c.object_id

    WHERE t.object_id = OBJECT_ID(N'' + @Table + '')

    and C.user_type_id not in (34, 35, 99)

    ) x

    SELECT @sql = ' SELECT ' + @columns + ', RowNumber FROM (

    SELECT ' + @columns + ', ROW_NUMBER() OVER

    (Partition by ' + @columns + ' order by ' + @columns + ') RowNumber

    FROM ' + @Table + ' ) a

    WHERE RowNumber > 1 '

    exec (@sql)

    --call proc with:

    EXEC dynamic_dup_finder table_name

  • would anyone recommend ever creating a delete_dups procedure?

    If it were me, probably not. Let's say you have a table with col1, col2, and col3. You create a procedure to dedupe based on those three columns, and birds are singing under the rainbow for some extended period of time. Along comes someone else who, unbeknownst to you, identifies the reason for the dupes, miraculously makes sense of them, and adds a column to your table, col4, to differentiate them, and they are no longer dupes. Now along comes some other new guy, who sees the dedupe procedure, and decides he's going to do some clean up, without even looking at the table, or realizing there are no longer dupes. He may be deleting rows that are no longer dupes, thanks to that last guy, and they are/were needed data. This may be an overly simple example but it could happen. If you have a big enough problem with dupes, you ought to investigate how they are getting there, and work on preventing them from happening in the first place with some constraints. Just my two cents.

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

  • First of all, I think I need to count my blessings that there are not a bunch of "guys" here who can write or alter anything on the server. That sound horrible. And I know exactly what causes the duplicates, and it won't change (humans).

    Anyway, I heed your advice, yet it would be so convenient, so went halfway. Changed a few lines to create the "dynamic_dup_printer" procedure that will print the code in the messages panel below. If, and only if, I am confident that those records need to be deleted, I can copy and paste to the query pain and execute.

  • Amy, as long as you are comfortable with it. At my last job I was the only one person who did anything at all to our SQL Server. I had to learn the hard way, but over the couple of years I ran it, I developed some complex procedures and such. I gave them a couple of weeks notice, and tried to make them understand how important it was they fill my position with someone capable of handling it, and willing to learn. They did not really take my advice. Now that I am not there anymore, my former employer has unleashed a few people on it who have already done serious damage by executing code when they should not have, and they no longer have anyone to fix it. Of course, it does not matter to me now, since I am not there any longer. All I'm saying is that you may decide to move on, and there may be people in the future executing your code without understanding it. In hind site, I probably should have dropped a lot of procedures before the dogs attacked the server. Anyhow, I'm glad you worked out a solution.

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply