Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Removing Duplicate Records Expand / Collapse
Author
Message
Posted Thursday, February 28, 2008 5:10 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, May 23, 2014 2:40 AM
Points: 55, Visits: 149
Comments posted to this topic are about the item Removing Duplicate Records
Post #461560
Posted Friday, April 18, 2008 12:22 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, July 14, 2014 10:33 PM
Points: 85, Visits: 260
Are you sure you don't mean

INSERT INTO phonebook SELECT
[phonenumber],
[firstname] ,
[lastname] ,
[company]
FROM #tmp

DROP TABLE #Tmp

at the end
Post #486918
Posted Friday, April 18, 2008 7:09 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, May 11, 2011 1:33 PM
Points: 85, Visits: 299
If the data set is large, and there are no FKs defined on the table, a faster method might be to SELECT INTO a physical table (which will create the table for you), drop the table with duplicate rows, then rename the table with the clean data set.

Ted
www.intermz.com | www.intermz.com/blog


---------------------------
|Ted Pin >>
Post #487163
Posted Friday, April 18, 2008 7:37 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, May 23, 2014 2:40 AM
Points: 55, Visits: 149
DAvid (4/18/2008)
Are you sure you don't mean

INSERT INTO phonebook SELECT
[phonenumber],
[firstname] ,
[lastname] ,
[company]
FROM #tmp

DROP TABLE #Tmp

at the end


Sure. I fixed the artice now.
Thank you.
Post #487182
Posted Friday, April 18, 2008 7:39 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, May 23, 2014 2:40 AM
Points: 55, Visits: 149
ted.pin (4/18/2008)
If the data set is large, and there are no FKs defined on the table, a faster method might be to SELECT INTO a physical table (which will create the table for you), drop the table with duplicate rows, then rename the table with the clean data set.

Ted
www.intermz.com | www.intermz.com/blog


SELECT INTO creates the table but with no indexes. Then how to remove duplicate records?
Post #487183
Posted Friday, April 18, 2008 7:47 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, May 11, 2011 1:33 PM
Points: 85, Visits: 299
Ah, you are right. You can add an IDENTITY column to the table after the SELECT INTO, which will act as the index. But by this time you might be better off adding the IDENTITY column to the source table (with dups), then INSERT INTO a new table like:

INSERT INTO Table_NoDups
(keyCol1, col1, col2)
SELECT keyCol1, col1, col2
FROM
Table_WithDups
WHERE identityCol IN
(SELECT MAX(identityCol) FROM Table_WithDups
GROUP BY keyCol1)

But by this point, I'm not sure what the performance gains would be... ;)


---------------------------
|Ted Pin >>
Post #487195
Posted Friday, April 18, 2008 8:30 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, May 23, 2014 2:40 AM
Points: 55, Visits: 149
I'm using that method (creating a unique index) for already existing tables. I mean without changing the original table.

If you able to change / redesign your table you can add a unique index on multiple fields. So inserting duplicate records gives a warning like 'Duplicate key was ignored', but no error.

For example:

CREATE TABLE phonebook (
[phonenumber] [varchar] (30),
[firstname] [varchar] (30),
[lastname] [varchar] (30),
[company] [varchar] (100)
)

CREATE UNIQUE INDEX IX_Unique ON phonebook (firstname, lastname) WITH IGNORE_DUP_KEY

But do not forget that, ADO will take 'Duplicate key was ignored' warning as an error!
So you should take care of this in your application.

A VB6 example;

Const cnstError_DuplicateKeyWasIgnored As Long = 3604
'...

On Error GoTo ADO_ERROR
Call oCON.Execute(strAnInsertSql)
On Error GoTo E 'Normal error handler
'...

ADO_ERROR:
If oCON.Errors(0).NativeError = cnstError_DuplicateKeyWasIgnored Then
'Duplicate Key Was Ignored errors should be ignored
Resume Next
End If
Post #487254
Posted Friday, April 18, 2008 11:30 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, November 1, 2013 7:23 AM
Points: 242, Visits: 939
If there is another field in the table that would determine which of the duplicate records you want to keep (such as the oldest record) you could key off that to remove the duplicates.


/* Set up the test table */

Declare @phonenum table (phone varchar(10), pdate int)

insert
into @phonenum (phone,pdate)
select 1,1 union all
select 2,2 union all
select 3,3 union all
select 3,2 union all
select 4,2 union all
select 5,1 union all
select 5,7 union all
select 5,3 union all
select 5,1 union all
select 6,1

select 'original table', phone, pdate
from @phonenum
order by phone, pdate

/* copy the non-dups from the original table based on oldest date */

select phone,pdate
into #temptab
from
(select p.phone,p.pdate
from @phonenum p
join @phonenum p2
on p.phone = p2.phone
GROUP BY p.phone,p.pdate having p.pdate = min(p2.pdate)) t


select 'temporary table - non-dups with oldest date', phone, pdate from #temptab
order by phone
drop table #temptab


If you need another tiebreaker when the dates are also the same, add that to the selection criteria. If it doesn't matter if the dates are the same, this will just put one of the records out there.

It is always good to review the data before running any updates or deletes so you can see if there are any special cases to take into account.

Toni
Post #487387
Posted Friday, April 18, 2008 12:16 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, November 1, 2013 7:23 AM
Points: 242, Visits: 939
Oh... one more thing. If there are Null values in any of your comparison columns, then you would have to treat those separately (doing comparisons with Null is a bit tricky). For instance should a Null date be considered lower than another date?

This can lead to some intricate modifications such as this one...

select phone , pdate  
into #temptab
from
(select p.phone,p.pdate
from @phonenum p
join @phonenum p2
on p.phone = p2.phone or p.phone is null
GROUP BY p.phone,p.pdate having p.pdate = min(p2.pdate) or p.pdate is null or p.phone is null) t

So you should think these kinds of situations through before making the final changes.

Toni
Post #487418
Posted Friday, April 18, 2008 3:07 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, June 20, 2014 2:22 PM
Points: 190, Visits: 436
If the ratio of duplicate to unique phone numbers is "small" AND you have a unique row identifier on the table you might be better off doing something like this...

If Object_Id('tempdb.dbo.#Tmp') is not Null Drop table dbo.#Tmp
Create table dbo.#Tmp(RId Int Identity(1,1),Phone VarChar(7))
Insert dbo.#Tmp Values('1234567')
Insert dbo.#Tmp Values('9876543')
Insert dbo.#Tmp Values('1234567') -- dup
Insert dbo.#Tmp Values('9876543') -- dup

Select * from dbo.#Tmp

Delete #Tmp
from dbo.#Tmp
join dbo.#Tmp t2 on #Tmp.RId>t2.RId and #Tmp.Phone=t2.Phone

Select * from dbo.#Tmp




PeteK
I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.
Post #487509
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse