Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Removing Duplicate Records


Removing Duplicate Records

Author
Message
endo64
endo64
Valued Member
Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)

Group: General Forum Members
Points: 55 Visits: 191
Comments posted to this topic are about the item Removing Duplicate Records
I cant let you do that Dave
I cant let you do that Dave
SSC Journeyman
SSC Journeyman (93 reputation)SSC Journeyman (93 reputation)SSC Journeyman (93 reputation)SSC Journeyman (93 reputation)SSC Journeyman (93 reputation)SSC Journeyman (93 reputation)SSC Journeyman (93 reputation)SSC Journeyman (93 reputation)

Group: General Forum Members
Points: 93 Visits: 271
Are you sure you don't mean


INSERT INTO phonebook SELECT
[phonenumber],
[firstname] ,
[lastname] ,
[company]
FROM #tmp

DROP TABLE #Tmp



at the end
Ted Manasa
Ted Manasa
SSC Journeyman
SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)

Group: General Forum Members
Points: 85 Visits: 299
If the data set is large, and there are no FKs defined on the table, a faster method might be to SELECT INTO a physical table (which will create the table for you), drop the table with duplicate rows, then rename the table with the clean data set.

Ted
www.intermz.com | www.intermz.com/blog

---------------------------
|Ted Pin >>
endo64
endo64
Valued Member
Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)

Group: General Forum Members
Points: 55 Visits: 191
DAvid (4/18/2008)
Are you sure you don't mean


INSERT INTO phonebook SELECT
[phonenumber],
[firstname] ,
[lastname] ,
[company]
FROM #tmp

DROP TABLE #Tmp



at the end


Sure. I fixed the artice now.
Thank you.
endo64
endo64
Valued Member
Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)

Group: General Forum Members
Points: 55 Visits: 191
ted.pin (4/18/2008)
If the data set is large, and there are no FKs defined on the table, a faster method might be to SELECT INTO a physical table (which will create the table for you), drop the table with duplicate rows, then rename the table with the clean data set.

Ted
www.intermz.com | www.intermz.com/blog


SELECT INTO creates the table but with no indexes. Then how to remove duplicate records?
Ted Manasa
Ted Manasa
SSC Journeyman
SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)SSC Journeyman (85 reputation)

Group: General Forum Members
Points: 85 Visits: 299
Ah, you are right. You can add an IDENTITY column to the table after the SELECT INTO, which will act as the index. But by this time you might be better off adding the IDENTITY column to the source table (with dups), then INSERT INTO a new table like:

INSERT INTO Table_NoDups
(keyCol1, col1, col2)
SELECT keyCol1, col1, col2
FROM
Table_WithDups
WHERE identityCol IN
(SELECT MAX(identityCol) FROM Table_WithDups
GROUP BY keyCol1)

But by this point, I'm not sure what the performance gains would be... Wink

---------------------------
|Ted Pin >>
endo64
endo64
Valued Member
Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)Valued Member (55 reputation)

Group: General Forum Members
Points: 55 Visits: 191
I'm using that method (creating a unique index) for already existing tables. I mean without changing the original table.

If you able to change / redesign your table you can add a unique index on multiple fields. So inserting duplicate records gives a warning like 'Duplicate key was ignored', but no error.

For example:

CREATE TABLE phonebook (
[phonenumber] [varchar] (30),
[firstname] [varchar] (30),
[lastname] [varchar] (30),
[company] [varchar] (100)
)

CREATE UNIQUE INDEX IX_Unique ON phonebook (firstname, lastname) WITH IGNORE_DUP_KEY

But do not forget that, ADO will take 'Duplicate key was ignored' warning as an error!
So you should take care of this in your application.

A VB6 example;

Const cnstError_DuplicateKeyWasIgnored As Long = 3604
'...

On Error GoTo ADO_ERROR
Call oCON.Execute(strAnInsertSql)
On Error GoTo E 'Normal error handler
'...

ADO_ERROR:
If oCON.Errors(0).NativeError = cnstError_DuplicateKeyWasIgnored Then
'Duplicate Key Was Ignored errors should be ignored
Resume Next
End If
toniupstny
toniupstny
SSC Veteran
SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)

Group: General Forum Members
Points: 248 Visits: 940
If there is another field in the table that would determine which of the duplicate records you want to keep (such as the oldest record) you could key off that to remove the duplicates.



/* Set up the test table */

Declare @phonenum table (phone varchar(10), pdate int)

insert
into @phonenum (phone,pdate)
select 1,1 union all
select 2,2 union all
select 3,3 union all
select 3,2 union all
select 4,2 union all
select 5,1 union all
select 5,7 union all
select 5,3 union all
select 5,1 union all
select 6,1

select 'original table', phone, pdate
from @phonenum
order by phone, pdate

/* copy the non-dups from the original table based on oldest date */

select phone,pdate
into #temptab
from
(select p.phone,p.pdate
from @phonenum p
join @phonenum p2
on p.phone = p2.phone
GROUP BY p.phone,p.pdate having p.pdate = min(p2.pdate)) t


select 'temporary table - non-dups with oldest date', phone, pdate from #temptab
order by phone
drop table #temptab




If you need another tiebreaker when the dates are also the same, add that to the selection criteria. If it doesn't matter if the dates are the same, this will just put one of the records out there.

It is always good to review the data before running any updates or deletes so you can see if there are any special cases to take into account.

Toni
toniupstny
toniupstny
SSC Veteran
SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)

Group: General Forum Members
Points: 248 Visits: 940
Oh... one more thing. If there are Null values in any of your comparison columns, then you would have to treat those separately (doing comparisons with Null is a bit tricky). For instance should a Null date be considered lower than another date?

This can lead to some intricate modifications such as this one...


select phone , pdate
into #temptab
from
(select p.phone,p.pdate
from @phonenum p
join @phonenum p2
on p.phone = p2.phone or p.phone is null
GROUP BY p.phone,p.pdate having p.pdate = min(p2.pdate) or p.pdate is null or p.phone is null) t



So you should think these kinds of situations through before making the final changes.

Toni
Peter E. Kierstead
Peter E. Kierstead
SSC-Enthusiastic
SSC-Enthusiastic (198 reputation)SSC-Enthusiastic (198 reputation)SSC-Enthusiastic (198 reputation)SSC-Enthusiastic (198 reputation)SSC-Enthusiastic (198 reputation)SSC-Enthusiastic (198 reputation)SSC-Enthusiastic (198 reputation)SSC-Enthusiastic (198 reputation)

Group: General Forum Members
Points: 198 Visits: 453
If the ratio of duplicate to unique phone numbers is "small" AND you have a unique row identifier on the table you might be better off doing something like this...

If Object_Id('tempdb.dbo.#Tmp') is not Null Drop table dbo.#Tmp
Create table dbo.#Tmp(RId Int Identity(1,1),Phone VarChar(7))
Insert dbo.#Tmp Values('1234567')
Insert dbo.#Tmp Values('9876543')
Insert dbo.#Tmp Values('1234567') -- dup
Insert dbo.#Tmp Values('9876543') -- dup

Select * from dbo.#Tmp

Delete #Tmp
from dbo.#Tmp
join dbo.#Tmp t2 on #Tmp.RId>t2.RId and #Tmp.Phone=t2.Phone

Select * from dbo.#Tmp



PeteK
I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search