Recent PostsRecent Posts Popular TopicsPopular Topics
 Home Search Members Calendar Who's On

 Deleting Duplicate Records Rate Topic Display Mode Topic Options
Author
 Message
 Posted Friday, March 10, 2006 7:40 AM
 Forum Newbie Group: General Forum Members Last Login: Sunday, July 9, 2006 7:38 AM Points: 5, Visits: 1
Post #264847
 Posted Friday, March 10, 2006 7:42 AM
 Forum Newbie Group: General Forum Members Last Login: Monday, October 2, 2006 3:24 PM Points: 1, Visits: 1
 I was totally looking for something like this. The fact that it ended up being so elgant and simple made me feel ignorant for a moment. This is a very helpful script. It's funny that after you think about trying to do something like this for so long you end up over thinking it. Then when you provide such a simple solution, it's like 'wow, how could I not have seen this before!' Thanks again, Brian
Post #264848
 Posted Friday, March 10, 2006 7:56 AM
 SSC Veteran Group: General Forum Members Last Login: Sunday, August 17, 2014 11:51 AM Points: 226, Visits: 119
 --OR!!declare @affected tinyintSET @affected = 1SET ROWCOUNT 1WHILE @affected > 0BEGIN DELETE employee FROM employee INNER JOIN ( SELECT id, name, salary FROM employee GROUP BY id, name,salary HAVING COUNT(1)>1 ) dupes ON dupes.id = employee.id AND dupes.name = employee.name AND dupes.salary = employee.salary SET @affected = @@ROWCOUNTENDSET ROWCOUNT 0SELECT * FROM employeeORDER BY idDROP TABLE employee
Post #264861
 Posted Friday, March 10, 2006 11:34 AM
 Forum Newbie Group: General Forum Members Last Login: Friday, September 1, 2006 1:58 PM Points: 1, Visits: 1
 I'll typically use a DELETE FROM / GROUP BY combination in conjunction with a temporary table:DELETE FROM employeeWHERE id NOT IN (    SELECT MIN(id)    FROM employee    GROUP BY name, salary)...assuming that id is an identity column (if your table doesn't have an identity column then you're on your own!).I'll often adapt this approach if there are e.g. timestamp fields, such that they're not included in the GROUP BY (if you'd still want to consider the record dupe'd regardless).If you're after all those records with the highest timestamp etc. then a temporary table with an identity column works wonders.To kill off all records with dupe'd emp name, leaving the record with the highest salary...CREATE TABLE #emps (    myID INTEGER IDENTITY(1, 1),    id INTEGER,    name NCHAR(100), -- whatever    salary MONEY)INSERT INTO #empsSELECT id, name, salaryFROM employeeORDER BY salary DESCDELETE FROM employeeWHERE id NOT IN (    SELECT id FROM #emps    WHERE myID NOT IN (        SELECT MIN(myID)        FROM #emps        GROUP BY name    ))...I realise that NOT IN is not a particularly expedient operation, but have had great success with this in the past.
Post #264957
 Posted Friday, March 10, 2006 12:29 PM
 SSC-Enthusiastic Group: General Forum Members Last Login: Saturday, August 31, 2013 11:41 PM Points: 121, Visits: 34
 My personal experience is that duplicate records in the database means poor validation on both client and DB side as well as poor database design. I agree with those people who have gone the way of 'nip the evil in the bud' and not allow duplicates to get in the database.The author mentioned that this article is for novice SQL Server users: I would rather train novice users how to avoid problems in the first place rather than react to them and realize that some things were left wanting in the DB designJust my 2 cents
Post #264979
 Posted Friday, March 10, 2006 12:34 PM
 SSC Veteran Group: General Forum Members Last Login: Sunday, August 17, 2014 11:51 AM Points: 226, Visits: 119
Post #264982
 Posted Saturday, March 11, 2006 6:13 AM
 SSC Rookie Group: Administrators Last Login: Wednesday, September 28, 2016 7:01 AM Points: 31, Visits: 52
 I'm surprised that anyone working with SQL Server needs lessons in removing duplicates, and puzzled at the complexity of this cursor-based solution originally published. A better solution has already been pointed out in the forum. As to the opinion that databases should never contain duplicate entries, yes, but somehow they sometimes creep in. Like the Kernighan and Richie exception error message, 'this is impossible', which one should never see and often does. Commercial databases are often of frightening complexity and one should be always on guard for duplicates. Yes, constraints are one way of doing it, but there are times when the overhead is too great. Editor: Simple-Talk
Post #265064
 Posted Sunday, March 12, 2006 12:29 PM
 Forum Newbie Group: General Forum Members Last Login: Thursday, August 2, 2007 9:21 AM Points: 1, Visits: 1
 Why not just do a select distinct into a new table, Delete the old and then rename the new to the old.
Post #265097
 Posted Monday, March 13, 2006 7:58 AM
 SSC Veteran Group: General Forum Members Last Login: Sunday, August 17, 2014 11:51 AM Points: 226, Visits: 119
 That was mentioned above
Post #265188
 Posted Tuesday, March 14, 2006 9:30 PM
 SSC Journeyman Group: General Forum Members Last Login: Saturday, October 23, 2010 4:51 AM Points: 80, Visits: 58
 Theoretically speaking select distinct is the best way but, which are the fields you want to be distinct may change in each case.Virtually these are the problems of people who deals with , dataporting /migration not of DBAs
Post #265776

 Permissions