Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Finding and Deleting Duplicate Data Expand / Collapse
Author
Message
Posted Sunday, July 13, 2003 12:00 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, February 27, 2008 9:40 PM
Points: 19, Visits: 3
Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/ccubley/findinganddeletingduplicatedata.asp


Chris Cubley
www.queryplan.com
Post #14119
Posted Friday, July 25, 2003 2:11 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, July 15, 2008 12:43 AM
Points: 45, Visits: 5
Neatly encapsulates several techniques in one place. I will save it in my bits of useful code.




Graham Farrow
DBA
Mastercare Central Division
Post #70371
Posted Friday, July 25, 2003 3:45 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, September 18, 2014 5:59 AM
Points: 94, Visits: 294
Useful topic to cover but isn't the final solution over-complex?
Couldn't you just use an inequality of PaymentID's on a self-join as follows..?

DELETE Payment
WHERE PaymentID IN
(SELECT P1.PaymentID
FROM Payment P1
INNER JOIN Payment P2
ON P1.CustomerNumber = p2.CustomerNumber
AND
p1.PostedDatetime = p2.PostedDatetime
AND
p1.PaymentAmt = p2.PaymentAmt
AND
p1.PaymentID > p2.PaymentID)

Mark




Post #70372
Posted Friday, July 25, 2003 5:49 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, February 24, 2004 7:31 AM
Points: 7, Visits: 1
This is not as complex as using temporary tables and scrubbing processes and then moving to live tables.

That was my first atempt.




Post #70373
Posted Friday, July 25, 2003 7:49 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, September 30, 2014 12:20 PM
Points: 165, Visits: 379
Excellent article ! This is something I plan to use on one of my tables later today.




Post #70374
Posted Saturday, July 26, 2003 5:53 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, October 16, 2014 8:45 AM
Points: 49, Visits: 43
Good Article! Thanks.




Post #70375
Posted Tuesday, July 29, 2003 10:18 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Thursday, August 14, 2014 9:45 PM
Points: 229, Visits: 276
I think you can use this approach too

--table structure

CREATE TABLE [a4] (
[id] [int] IDENTITY (1, 1) NOT NULL ,
[title] [varchar] (50) ,
[name] [varchar] (50) NULL ,
CONSTRAINT [PK_a4] PRIMARY KEY CLUSTERED
(
[id]
) ON [PRIMARY]
) ON [PRIMARY]
GO



--to have distinct id for the doubles

--the first id's

select distinct a.id, a.title, a.name from a4 a, a4 b where a.title = b.title and a.name = b.name and a.id < b.id




--the last id's

select distinct a.id, a.title, a.name from a4 a, a4 b where a.title = b.title and a.name = b.name and a.id > b.id




--to delete the first id's doubles and get unique rows in the original table


delete from a4 where id in (
select distinct a.id from a4 a, a4 b where a.title = b.title and a.name = b.name and a.id < b.id )

--to delete the last id's doubles and get unique rows in the original table
delete from a4 where id in (
select distinct a.id from a4 a, a4 b where a.title = b.title and a.name = b.name and a.id > b.id )





Post #70376
Posted Tuesday, July 29, 2003 1:18 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, September 30, 2014 12:20 PM
Points: 165, Visits: 379
Just to follow up, I followed all of the sample code and everything worked great the first time. The article was well written and the logic was clear in describing why the steps were necessary. Keep up the good work !!!




Post #70377
Posted Monday, July 26, 2004 11:53 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Tuesday, June 21, 2011 10:03 AM
Points: 577, Visits: 102

This is a pretty good technique, but there are problems with it.  Using derived tables in a query can be very powerful, but they can also cause excessive use of tempdb (similar to using # tables).  Often, there are other ways of doing the same thing for cheaper.

So, if you don't have to use them, don't.  The query below is functionally equivalent to the delete statement in the article, but it costs almost half.


delete p1
From Payment p1
JOIN Payment p2
   on  p1.CustomerNumber = p2.CustomerNumber and
    p1.PostedDateTime = p2.PostedDateTime and
    p1.PaymentAmt   = p2.PaymentAmt
where p1.PaymentID < p2.PaymentID

 



Signature is NULL
Post #128244
Posted Monday, July 26, 2004 11:55 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Tuesday, June 21, 2011 10:03 AM
Points: 577, Visits: 102

Here's the full script:


if object_ID('tempdb..#Payment') is not null drop table #Payment
CREATE TABLE #Payment(
 PaymentID int identity Not nUll Primary Key,
 CustomerNumber int  NOT NULL,
 PostedDatetime datetime NOT NULL,
 PaymentAmt money  NOT NULL,
 PaymentMemo varchar(200) NOT NULL,
 CCRefNum char(10) NOT NULL
)

 

Insert #Payment Values (1, '01/01/2004', 1, '', '')
Insert #Payment Values (1, '01/01/2004', 1, '', '')
Insert #Payment Values (2, '01/02/2004', 2, '', '')
Insert #Payment Values (2, '01/02/2004', 2, '', '')
Insert #Payment Values (2, '01/02/2004', 2, '', '')
Insert #Payment Values (3, '01/03/2004', 3, '', '')
Insert #Payment Values (3, '01/03/2004', 3, '', '')
Insert #Payment Values (3, '01/03/2004', 3, '', '')
Insert #Payment Values (3, '01/03/2004', 3, '', '')

 

 

delete p1
From #Payment p1
JOIN #Payment p2
   on  p1.CustomerNumber = p2.CustomerNumber and
    p1.PostedDateTime = p2.PostedDateTime and
    p1.PaymentAmt   = p2.PaymentAmt
where p1.PaymentID < p2.PaymentID

 

DELETE FROM
 p1
FROM
 #Payment p1
INNER JOIN
 (
  SELECT
   MAX(PaymentID) AS PaymentID,
   CustomerNumber,
   PostedDatetime,
   PaymentAmt
  FROM
   #Payment
  GROUP BY
   CustomerNumber,
   PostedDatetime,
   PaymentAmt
  HAVING
   COUNT(*) > 1
  p2
 ON(
  p1.CustomerNumber = p2.CustomerNumber
  AND
  p1.PostedDatetime = p2.PostedDatetime
  AND
  p1.PaymentAmt = p2.PaymentAmt
  AND
  p1.PaymentID <> p2.PaymentID
 

 



Signature is NULL
Post #128246
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse