SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Deleting Duplicate Records


Deleting Duplicate Records

Author
Message
Mark Erickson-288856
Mark Erickson-288856
Forum Newbie
Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)

Group: General Forum Members
Points: 7 Visits: 7
Cleanest and easiest way to clean up duplicates that I have seen.
Adwaita Prasad Panda
Adwaita Prasad Panda
Grasshopper
Grasshopper (20 reputation)Grasshopper (20 reputation)Grasshopper (20 reputation)Grasshopper (20 reputation)Grasshopper (20 reputation)Grasshopper (20 reputation)Grasshopper (20 reputation)Grasshopper (20 reputation)

Group: General Forum Members
Points: 20 Visits: 59
A very good and helpful topic
Sudarsan Srinivasan
Sudarsan Srinivasan
SSC-Addicted
SSC-Addicted (456 reputation)SSC-Addicted (456 reputation)SSC-Addicted (456 reputation)SSC-Addicted (456 reputation)SSC-Addicted (456 reputation)SSC-Addicted (456 reputation)SSC-Addicted (456 reputation)SSC-Addicted (456 reputation)

Group: General Forum Members
Points: 456 Visits: 144

Hi Ramakrishnan,

The script you had provided was quite a bit useful but i have some reservations on this method. Actually i had drawn up a similar script but found that when the number of columns increase in the table, you will have to group by all the columns. is there any other method to delete the duplicate rows other than taking the stock into the temp table and again fetching it back to the physical table.


colin Robinson-345240
colin Robinson-345240
SSC Rookie
SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)

Group: General Forum Members
Points: 45 Visits: 148

My method for duplicate removal may not suit every one. it offers the advantage if you have duplicate data but the identity column makes the data unique you can modify the select * to only the duplicated fields.

begin tran

Select Distinct * into [newtable] from [badtable]

truncate table [badtable]

insert into badtable select * from newtable

drop table [newtable]

ColinR


RyanRandall
RyanRandall
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2259 Visits: 4652

Here's another method. This one adds an identity column, uses it, and then drops it.

ALTER TABLE dbo.employee ADD tempid INT IDENTITY(1, 1)
DELETE dbo.employee WHERE tempid NOT IN (SELECT MIN(tempid) FROM dbo.employee GROUP BY id, NAME, salary)
ALTER TABLE dbo.employee DROP COLUMN tempid



Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.
SwePeso
SwePeso
Hall of Fame
Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)

Group: General Forum Members
Points: 3959 Visits: 3433

Or, if you use SQL Server 2005,

DELETE t1
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY Col2 DESC) AS RecID
FROM Table1
) AS t1
WHERE RecID > 1




N 56°04'39.16"
E 12°55'05.25"
Perry Citrowske
Perry Citrowske
SSC-Enthusiastic
SSC-Enthusiastic (109 reputation)SSC-Enthusiastic (109 reputation)SSC-Enthusiastic (109 reputation)SSC-Enthusiastic (109 reputation)SSC-Enthusiastic (109 reputation)SSC-Enthusiastic (109 reputation)SSC-Enthusiastic (109 reputation)SSC-Enthusiastic (109 reputation)

Group: General Forum Members
Points: 109 Visits: 33
My question is this....



Please correct me if I'm wrong, or how to get around the issue, but how about a real-life scenario that I have:



I have been given a logical definition of a duplicate row to be a composite of 4 columns. However, there are actually 10 columns in the table. This data has duplicates in the logical composite key, but it's not necessarily a duplicate if you included the other data fields in the "group by". If I want to copy the data to the other "temp" table, I have to include all the columns in the "group by" or I will lose the other data, but in my case, I can't do that or I will miss some duplicates.
gfabbri
gfabbri
SSC Rookie
SSC Rookie (30 reputation)SSC Rookie (30 reputation)SSC Rookie (30 reputation)SSC Rookie (30 reputation)SSC Rookie (30 reputation)SSC Rookie (30 reputation)SSC Rookie (30 reputation)SSC Rookie (30 reputation)

Group: General Forum Members
Points: 30 Visits: 34
sorry I read on msdn that 'Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server'. So this method will not work anymore, will it not?
Thanks
Gio
AJ07
AJ07
SSC Veteran
SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)

Group: General Forum Members
Points: 202 Visits: 198
Hi ,
I have tried duplicate record deletion without using temp tables.
And according to me its faster and a very short query.

CREATE TABLE EMP
(EMPID INT,EMPNAME VARCHAR(100))

INSERT INTO EMP VALUES (1,'A')
INSERT INTO EMP VALUES (1,'A')
INSERT INTO EMP VALUES (1,'A')
INSERT INTO EMP VALUES (2,'B')
INSERT INTO EMP VALUES (2,'B')
INSERT INTO EMP VALUES (3,'C')
INSERT INTO EMP VALUES (4,'D')
INSERT INTO EMP VALUES (4,'D')
INSERT INTO EMP VALUES (4,'D')
INSERT INTO EMP VALUES (4,'D')
INSERT INTO EMP VALUES (5,'E')
INSERT INTO EMP VALUES (5,'E')
INSERT INTO EMP VALUES (6,'F')
INSERT INTO EMP VALUES (6,'F')
INSERT INTO EMP VALUES (6,'F')
INSERT INTO EMP VALUES (6,'F')
INSERT INTO EMP VALUES (6,'F')
INSERT INTO EMP VALUES (7,'G')

SELECT * FROM EMP

SELECT @@ROWCOUNT
--(Initially the rowcount must be > 0 )
WHILE @@ROWCOUNT > 0
DELETE TOP(1) FROM EMP WHERE EXISTS
(SELECT E.EMPID ,COUNT(E.EMPID) FROM EMP E
WHERE EMP.EMPID=E.EMPID GROUP BY E.EMPID HAVING COUNT(E.EMPID) > 1)

SELECT * FROM EMP


Correct if wrong ........................
Smooooth
AJ07
AJ07
SSC Veteran
SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)SSC Veteran (202 reputation)

Group: General Forum Members
Points: 202 Visits: 198
Hi Peso,
i tried your code but it did not work.
Are you sure it works?

Thanks
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search