Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

deleting duplicate records from a table Expand / Collapse
Author
Message
Posted Monday, August 18, 2014 11:39 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Saturday, September 13, 2014 4:48 PM
Points: 10, Visits: 14
I have created a table Emp_Salary
Create Table Employee_Salary
(
EmployeeID Int,
FirstName Varchar(20),
Salary Money
)

and inserted some records into the table. Some of them are duplicate records. Now, I want to remove all the duplicate records present in the table.The query I wrote was:

Delete from Employee_Salary
WHERE EXISTS (select FirstName,COUNT(FirstName) from Employee_Salary
GROUP BY FirstName
HAving COUNT(FirstName) > 1)

But this is deleting all the records from the table..How can we just delete the duplicate records only?
Post #1604600
Posted Monday, August 18, 2014 11:51 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Yesterday @ 9:47 AM
Points: 272, Visits: 897
Try http://www.sqlservercentral.com/articles/T-SQL/dedupingdatainsqlserver2005/2260/
Post #1604610
Posted Monday, August 18, 2014 12:12 PM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Yesterday @ 9:28 PM
Points: 17,729, Visits: 15,597
You can try this article that shows how to do that.

http://jasonbrimhall.info/?s=duplicate+cte




Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server


SQL RNNR

Posting Performance Based Questions - Gail Shaw
Posting Data Etiquette - Jeff Moden
Hidden RBAR - Jeff Moden
VLFs and the Tran Log - Kimberly Tripp
Post #1604625
Posted Monday, August 18, 2014 12:15 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 9:48 PM
Points: 3,636, Visits: 8,151

If you try this, I'd suggest to use one of the alternative methods posted in the discussion of the article.



Luis C.
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1604628
Posted Tuesday, August 19, 2014 6:24 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, September 24, 2014 4:50 PM
Points: 358, Visits: 876
Something like this ?

WITH myCTE
AS (
SELECT row_number() over (partition by EmployeeID, Firstname,Salary
ORDER BY EmployeeID, FirstName, Salart ) as Colm1,*
from Employee_salary
)
DELETE From myCTE where Colm1> 1

Post #1605256
Posted Wednesday, August 27, 2014 9:19 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Yesterday @ 5:09 PM
Points: 445, Visits: 1,799
I ran into this method in an Itzik Ben-Gan article, which looks more complicated, but of course somehow manages to run a bit faster where I've tested it against a single CTE with ROW_NUMBER() involved, I guess because there's no ORDER BY used anywhere?


;WITH    dupe1
AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RN ,
[FirstName]
FROM dbo.Emp_Salary
),
dupe2
AS ( SELECT [FirstName] ,
MAX(RN) AS [MAX]
FROM dupe1
GROUP BY [FirstName]
HAVING COUNT(*) > 1
)
SELECT d1.*
--DELETE d1
FROM dupe1 d1
INNER JOIN dupe2 d2
ON d1.[FirstName] = d2.[FirstName]
AND d1.RN < d2.[MAX];

Post #1607888
Posted Wednesday, August 27, 2014 3:15 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, September 24, 2014 4:50 PM
Points: 358, Visits: 876
sqldriver (8/27/2014)
I ran into this method in an Itzik Ben-Gan article, which looks more complicated, but of course somehow manages to run a bit faster where I've tested it against a single CTE with ROW_NUMBER() involved, I guess because there's no ORDER BY used anywhere?


;WITH    dupe1
AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RN ,
[FirstName]
FROM dbo.Emp_Salary
),
dupe2
AS ( SELECT [FirstName] ,
MAX(RN) AS [MAX]
FROM dupe1
GROUP BY [FirstName]
HAVING COUNT(*) > 1
)
SELECT d1.*
--DELETE d1
FROM dupe1 d1
INNER JOIN dupe2 d2
ON d1.[FirstName] = d2.[FirstName]
AND d1.RN < d2.[MAX];



Another consideration on the performance is that this only compares the first name, which multiple people can of course share. The OP did not respond to my suggestion on if what I posted was a good and simple solution. It seems to me that if there is a employeeID in there and no trust issues with the data, then all you have to do is
WITH myCTE
AS (
SELECT row_number() over (partition by EmployeeID
ORDER BY EmployeeID) as Colm1,*
from Employee_salary
)
DELETE From myCTE where Colm1> 1

But since this will very likely be a one time thing (else there is a bigger problem with the system ) , performance will not be as high on the must have list.
Post #1608056
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse