SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Removing Duplicate Records


Removing Duplicate Records

Author
Message
Syed-201559
Syed-201559
SSC-Enthusiastic
SSC-Enthusiastic (199 reputation)SSC-Enthusiastic (199 reputation)SSC-Enthusiastic (199 reputation)SSC-Enthusiastic (199 reputation)SSC-Enthusiastic (199 reputation)SSC-Enthusiastic (199 reputation)SSC-Enthusiastic (199 reputation)SSC-Enthusiastic (199 reputation)

Group: General Forum Members
Points: 199 Visits: 94
There is no need to create temporary tables, very helpful when deleting from large tables, you can delete the duplicate records using inner join and setting the rowcout to 1, the modified version of script.

CREATE TABLE #phonebook (
[phonenumber] [varchar] (30),
[firstname] [varchar] (30),
[lastname] [varchar] (30),
[company] [varchar] (100)
)
INSERT #phonebook SELECT '902','syed','iqbal','sm soft'
INSERT #phonebook SELECT '905','john','ksjl89','company llc'
INSERT #phonebook SELECT '909','joe','average','united'
-- Duplicate insert 1
INSERT #phonebook SELECT '902','syed','iqbal','sm soft'
INSERT #phonebook SELECT '905','john','ksjl89','company llc'
INSERT #phonebook SELECT '909','joe','average','united'
-- Duplicate insert 2
INSERT #phonebook SELECT '902','syed','iqbal','sm soft'
INSERT #phonebook SELECT '905','john','ksjl89','company llc'
INSERT #phonebook SELECT '909','joe','average','united'

--Show Duplicate Phonenumbers in Phonebook
SELECT phonenumber, COUNT(*) FROM #phonebook
GROUP BY phonenumber HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC

SET ROWCOUNT 1
SELECT @@rowcount
WHILE @@rowcount > 0
DELETE pb FROM #phonebook as pb
INNER JOIN
(SELECT phonenumber
FROM #phonebook
GROUP BY phonenumber HAVING count(*) > 1)
AS c ON c.phonenumber = pb.phonenumber
SET ROWCOUNT 0


SELECT * FROM #phonebook

DROP TABLE #phonebook


Kindest Regards,

Syed
Sr. SQL Server DBA
toniupstny
toniupstny
SSC-Addicted
SSC-Addicted (410 reputation)SSC-Addicted (410 reputation)SSC-Addicted (410 reputation)SSC-Addicted (410 reputation)SSC-Addicted (410 reputation)SSC-Addicted (410 reputation)SSC-Addicted (410 reputation)SSC-Addicted (410 reputation)

Group: General Forum Members
Points: 410 Visits: 940
I agree you do not really need the temporary table unless you want it to be able to review results before making final changes to the "live" database.

If you want to forego the temp table, below is a variation (without loops or temp tables) for removing duplicates if you have some other criteria to determine which duplicate to keep. If you have to pick off a completely identical record, then I guess you would have to do some sort of @@Rowcount function as shown in a previous posting.

If you do have criteria to pick out which record(s) to remove then the removal is a simple Delete statement. Again if the records are exact duplicates then it is a coin-toss to find which one to take out so you would need a rowcount, cursor kind of method, index of columns with dup key (as in the original article and some posts), or a select distinct into a temporary table following that method.


The Delete Statement (for duplicate phone values with differing pdate values)

DELETE p1
FROM @phonenum p1
INNER JOIN @phonenum p2
ON p1.phone = p2.phone and p1.pdate > p2.pdate



The Delete statement in test code

Declare @phonenum table (phone varchar(10), pdate int)

insert -- set up for test
into @phonenum (phone,pdate)
select '1',1 union all
select '2',2 union all
select '3',3 union all
select '3',2 union all
select '4',2 union all
select '5',1 union all
select '5',7 union all
select '5',3 union all
select '6',1


select 'original table', phone, pdate -- show before picture
from @phonenum


/* Remove the dups from the original table based on oldest date */

DELETE p1
FROM @phonenum p1
INNER JOIN @phonenum p2
ON p1.phone = p2.phone and p1.pdate > p2.pdate



select 'after delete', phone, pdate -- show after picture
from @phonenum


Toni
I cant let you do that Dave
I cant let you do that Dave
SSC-Enthusiastic
SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)

Group: General Forum Members
Points: 117 Visits: 271
I would be hesitant about deleting duplicates in the same statement that finds them. If the table was non trivial size (ie. you didnt want to hand list the record keys to be deleted)

Any long running scan of the table looking for duplicates could cascade its lock to table level.

Generated record deletes in small transactions would have less impact if the system was live.
endo64
endo64
SSC Journeyman
SSC Journeyman (99 reputation)SSC Journeyman (99 reputation)SSC Journeyman (99 reputation)SSC Journeyman (99 reputation)SSC Journeyman (99 reputation)SSC Journeyman (99 reputation)SSC Journeyman (99 reputation)SSC Journeyman (99 reputation)

Group: General Forum Members
Points: 99 Visits: 191
And ofcourse you can create a CHECKSUM computed column which is calculated from all other fields. So you can determine all the duplicate records easily, they are all give the same checksum value.

But unfortunately CHECKSUM is 32 bit integers, so if you have too much records in your table it may give the same checksum value.

You can also use the

--create a checksum field to see duplicates
ALTER TABLE Phonebook
ADD chkValue AS checksum(first_name, last_name, phonenumber)
--select and/or delete duplicates
...
--create unique index so no more dup. can be inserted
CREATE INDEX chkUnique ON Phonebook (chkValue) WITH IGNORE_DUP_KEY

Don't forget, if you have thousands of records then there can be different rows that give the same checksum value.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search