Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Deduping Data in SQL Server 2005 Expand / Collapse
Author
Message
Posted Monday, January 30, 2006 5:35 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, November 17, 2010 3:38 AM
Points: 445, Visits: 82
Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/chawkins/dedupingdatainsqlserver2005.asp


Post #254584
Posted Tuesday, February 7, 2006 3:37 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, November 17, 2010 3:38 AM
Points: 445, Visits: 82
When you are populating the record set, take out the GO between the next-to-the-last and the last insert statements. Having this penultimate GO in the set of queries will remove the scope of the @NOW variable and cause the last INSERT to fail.


Post #256304
Posted Tuesday, February 7, 2006 4:03 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Thursday, August 28, 2014 12:47 AM
Points: 7,005, Visits: 8,451

nice example

Here is another :

WITH cteEmployeeOrderedByMyRowNumber AS
(SELECT ROW_NUMBER() OVER (ORDER BY EMPID ASC, REFDATE ASC) AS MyRowNumber
, Row_NUMBER() Over (Partition By EMPID,FNAME,LNAME Order By REFDATE ASC) as PartitionRank
, *
FROM EMPLOYEE
-- WHERE 1 = 1
)
DELETE FROM cteEmployeeOrderedByMyRowNumber
where PartitionRank > 1 ;

 

just to get it in the tips of the fingers

 



Johan


Don't drive faster than your guardian angel can fly ...
but keeping both feet on the ground won't get you anywhere

- How to post Performance Problems
- How to post data/code to get the best help


- How to prevent a sore throat after hours of presenting ppt ?


"press F1 for solution", "press shift+F1 for urgent solution"


Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me
Post #256306
Posted Tuesday, February 7, 2006 8:11 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Tuesday, November 25, 2008 9:48 AM
Points: 57, Visits: 8

Hi...

 

I usually clean up my "mess" with "something" like this

 

delete from myTable where myID NOT in

(select min(MyID) from myTable group by myUniqueField[s])

 

But I think the best application for CTE are recursive querries...

Post #256425
Posted Tuesday, February 7, 2006 9:24 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, January 15, 2008 5:59 PM
Points: 6, Visits: 3
Is there a performance gain to using this function and technique?  Or is it just one of those "hey cool function --- let's use it"? 




aries
Post #256461
Posted Tuesday, February 7, 2006 1:14 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, November 23, 2009 4:45 AM
Points: 163, Visits: 9

great examples!
 
If only duplicates need to be removed the ROW_NUMBER() may not be needed.

WITH cteEmployeeOrderedByMyRank AS
(SELECT RANK() OVER (PARTITION BY EMPID,FNAME,LNAME ORDER BY REFDATE ASC) AS PartitionRank
, *
FROM EMPLOYEE
-- WHERE 1 = 1
)
DELETE FROM cteEmployeeOrderedByMyRank
WHERE PartitionRank > 1 ;
 
It surely seems to be much faster than the cursor based apporach.




bm21
Post #256535
Posted Tuesday, February 7, 2006 4:30 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Sunday, April 17, 2011 1:56 AM
Points: 52, Visits: 59

Interesting demonstration of the ROW_NUMBER() function. Please say it ain’t so, Joe! - that you are not using cursors to remove duplicate rows. Even the technique of a SELECT DISTINCT into a temporary table would be a better option. As other readers have commented, there are a number of ways to remove duplicate rows. This would be my approach:

 

DELETE Employee

FROM Employee a INNER JOIN (SELECT Empid,

                                                                          FName,

                                                                          LName,

                                                                         MIN(RefDate) AS 'MinDate'

                                                          FROM Employee   

                                                         GROUP BY Empid, FName, LName) b

                                  ON a.Empid = b.Empid

                                 AND a.FName = b.FName

                                 AND a.LName = b.LName

                                 AND a.RefDate > b.MinDate

 

This would still leave the issue of James verses Jim that would need to be resolved separately. If you didn’t care about spelling variations and wanted to assume that the first entry was the correct one then this would work:

 

DELETE Employee

FROM Employee a INNER JOIN (SELECT Empid,

                                                                          MIN(RefDate) AS 'MinDate'

                                                          FROM Employee

                                                         GROUP BY Empid) b

                                  ON a.Empid = b.Empid

                                 AND a.RefDate > b.MinDate

 

I would be interested in the question of performance between the two techniques but I’d put my money on mine which I suspect has a whole lot less overhead even as a cross join than having the engine generate a row position.

 

 




Post #256587
Posted Wednesday, February 7, 2007 4:45 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, July 8, 2014 7:17 AM
Points: 1,306, Visits: 272

Another option (that works with SQL 2000) and even in cases of all the columns having the same value (no column to differentiate the rows), is inserting the result set into a new table with an identity column (or adding an identity column to the original table). After that, it's just a matter of keeping the distinct rows as shown in the comments.

I can't remember where I read this solution, but it was either here in SQL Server Central or SQL Team foruns.

André Cardoso




Post #343065
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse