|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Friday, May 03, 2013 1:09 AM
Points: 54,
Visits: 140
|
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Today @ 9:32 PM
Points: 85,
Visits: 251
|
|
Are you sure you don't mean
INSERT INTO phonebook SELECT [phonenumber], [firstname] , [lastname] , [company] FROM #tmp
DROP TABLE #Tmp
at the end
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Wednesday, May 11, 2011 1:33 PM
Points: 85,
Visits: 299
|
|
If the data set is large, and there are no FKs defined on the table, a faster method might be to SELECT INTO a physical table (which will create the table for you), drop the table with duplicate rows, then rename the table with the clean data set.
Ted www.intermz.com | www.intermz.com/blog
--------------------------- |Ted Pin >>
|
|
|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Friday, May 03, 2013 1:09 AM
Points: 54,
Visits: 140
|
|
DAvid (4/18/2008)
Are you sure you don't mean INSERT INTO phonebook SELECT [phonenumber], [firstname] , [lastname] , [company] FROM #tmp
DROP TABLE #Tmp
at the end
Sure. I fixed the artice now. Thank you.
|
|
|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Friday, May 03, 2013 1:09 AM
Points: 54,
Visits: 140
|
|
ted.pin (4/18/2008)
If the data set is large, and there are no FKs defined on the table, a faster method might be to SELECT INTO a physical table (which will create the table for you), drop the table with duplicate rows, then rename the table with the clean data set. Ted www.intermz.com | www.intermz.com/blog
SELECT INTO creates the table but with no indexes. Then how to remove duplicate records?
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Wednesday, May 11, 2011 1:33 PM
Points: 85,
Visits: 299
|
|
Ah, you are right. You can add an IDENTITY column to the table after the SELECT INTO, which will act as the index. But by this time you might be better off adding the IDENTITY column to the source table (with dups), then INSERT INTO a new table like:
INSERT INTO Table_NoDups (keyCol1, col1, col2) SELECT keyCol1, col1, col2 FROM Table_WithDups WHERE identityCol IN (SELECT MAX(identityCol) FROM Table_WithDups GROUP BY keyCol1)
But by this point, I'm not sure what the performance gains would be... ;)
--------------------------- |Ted Pin >>
|
|
|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Friday, May 03, 2013 1:09 AM
Points: 54,
Visits: 140
|
|
I'm using that method (creating a unique index) for already existing tables. I mean without changing the original table.
If you able to change / redesign your table you can add a unique index on multiple fields. So inserting duplicate records gives a warning like 'Duplicate key was ignored', but no error.
For example:
CREATE TABLE phonebook ( [phonenumber] [varchar] (30), [firstname] [varchar] (30), [lastname] [varchar] (30), [company] [varchar] (100) )
CREATE UNIQUE INDEX IX_Unique ON phonebook (firstname, lastname) WITH IGNORE_DUP_KEY
But do not forget that, ADO will take 'Duplicate key was ignored' warning as an error! So you should take care of this in your application.
A VB6 example;
Const cnstError_DuplicateKeyWasIgnored As Long = 3604 '...
On Error GoTo ADO_ERROR Call oCON.Execute(strAnInsertSql) On Error GoTo E 'Normal error handler '...
ADO_ERROR: If oCON.Errors(0).NativeError = cnstError_DuplicateKeyWasIgnored Then 'Duplicate Key Was Ignored errors should be ignored Resume Next End If
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Friday, February 22, 2013 9:04 AM
Points: 241,
Visits: 928
|
|
If there is another field in the table that would determine which of the duplicate records you want to keep (such as the oldest record) you could key off that to remove the duplicates.
/* Set up the test table */
Declare @phonenum table (phone varchar(10), pdate int) insert into @phonenum (phone,pdate) select 1,1 union all select 2,2 union all select 3,3 union all select 3,2 union all select 4,2 union all select 5,1 union all select 5,7 union all select 5,3 union all select 5,1 union all select 6,1
select 'original table', phone, pdate from @phonenum order by phone, pdate
/* copy the non-dups from the original table based on oldest date */
select phone,pdate into #temptab from (select p.phone,p.pdate from @phonenum p join @phonenum p2 on p.phone = p2.phone GROUP BY p.phone,p.pdate having p.pdate = min(p2.pdate)) t
select 'temporary table - non-dups with oldest date', phone, pdate from #temptab order by phone drop table #temptab
If you need another tiebreaker when the dates are also the same, add that to the selection criteria. If it doesn't matter if the dates are the same, this will just put one of the records out there.
It is always good to review the data before running any updates or deletes so you can see if there are any special cases to take into account.
Toni
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Friday, February 22, 2013 9:04 AM
Points: 241,
Visits: 928
|
|
Oh... one more thing. If there are Null values in any of your comparison columns, then you would have to treat those separately (doing comparisons with Null is a bit tricky). For instance should a Null date be considered lower than another date?
This can lead to some intricate modifications such as this one...
select phone , pdate into #temptab from (select p.phone,p.pdate from @phonenum p join @phonenum p2 on p.phone = p2.phone or p.phone is null GROUP BY p.phone,p.pdate having p.pdate = min(p2.pdate) or p.pdate is null or p.phone is null) t
So you should think these kinds of situations through before making the final changes.
Toni
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Monday, March 25, 2013 8:37 AM
Points: 163,
Visits: 275
|
|
If the ratio of duplicate to unique phone numbers is "small" AND you have a unique row identifier on the table you might be better off doing something like this...
If Object_Id('tempdb.dbo.#Tmp') is not Null Drop table dbo.#Tmp Create table dbo.#Tmp(RId Int Identity(1,1),Phone VarChar(7)) Insert dbo.#Tmp Values('1234567') Insert dbo.#Tmp Values('9876543') Insert dbo.#Tmp Values('1234567') -- dup Insert dbo.#Tmp Values('9876543') -- dup
Select * from dbo.#Tmp
Delete #Tmp from dbo.#Tmp join dbo.#Tmp t2 on #Tmp.RId>t2.RId and #Tmp.Phone=t2.Phone
Select * from dbo.#Tmp
PeteK I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.
|
|
|
|