Handle Duplicate Records

,

The "Featured Script" a couple days ago was a simple way to delete duplicates. A common task if only because end-users will rarely check if some value already exists before entering it again. Wanted to offer a more complicated solution, comes in handy when you need to do more than just delete -- combine child records, keep a record of deletes, make a change to the record you're keeping, etc.

This solution uses embedded cursors, so is definitely a performance monster. On very large tables, with a large number of duplicates, you should run it during off-times.

[ListOfFields] below would be all the fields that have to match to be considered a duplicate.

ย 

Aside - my first posting on SQLServerCentral. I'm a long-time reader and big fan, dipping my foot in on posting. Maybe I'll move up to articles next time... ๐Ÿ™‚ This is kinda fun.

-- first declare variables: @Count INT, @i INT, @KeepID GUID/INT/?, and ListOfFields

DECLARE curDups CURSOR FOR 
 SELECT COUNT(1), [ListOfFields]
 FROM [Table and any joins] 
 WHERE [condtion]
 GROUP BY [ListOfFields]
 HAVING COUNT(1) > 1

OPEN curDups
FETCH NEXT FROM curDups INTO @Count, ... ListOfFields variables
WHILE @@FETCH_STATUS = 0 BEGIN
   DECLARE curSub CURSOR FOR
    SELECT ...
    FROM ...
    WHERE [matches ListOfFields]
    ORDER BY -- whatever so that the record you want to KEEP is first, 
	--I usually order by ModifiedDate and keep the first one entered

   SET @i = 0
   OPEN curSub
   FETCH NEXT FROM curSub INTO @ID ...
   WHILE @i < @Count BEGIN
      IF @i = 0 BEGIN
         SET @KeepID = @ID
      END
      ELSE BEGIN
          --... execute commands, delete 
      END
      SET @i = @i + 1
      FETCH NEXT FROM curSub INTO @ID ...
   END
   CLOSE curSub
   DEALLOCATE curSub

   FETCH NEXT FROM curDups INTO @Count, ... ListOfFields
END
CLOSE curDups 
DEALLOCATE curDups
GO

Rate

โ˜… โ˜… โ˜… โ˜… โ˜… โ˜… โ˜… โ˜… โ˜… โ˜…

1 (1)

Share

Share

Rate

โ˜… โ˜… โ˜… โ˜… โ˜… โ˜… โ˜… โ˜… โ˜… โ˜…

1 (1)