SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


How to eliminate duplicate records from a database table?


How to eliminate duplicate records from a database table?

Author
Message
rajeshwari-967105
rajeshwari-967105
SSC Journeyman
SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)SSC Journeyman (80 reputation)

Group: General Forum Members
Points: 80 Visits: 34
Comments posted to this topic are about the item How to eliminate duplicate records from a database table?
Malcolm Leach
Malcolm Leach
SSC Veteran
SSC Veteran (213 reputation)SSC Veteran (213 reputation)SSC Veteran (213 reputation)SSC Veteran (213 reputation)SSC Veteran (213 reputation)SSC Veteran (213 reputation)SSC Veteran (213 reputation)SSC Veteran (213 reputation)

Group: General Forum Members
Points: 213 Visits: 71
I'm not sure I'd use this script on a table with 10 million rows in it due to the potential for the cursor to become unwieldy if there are a large number of duplicates. I have eliminated duplicates using this kind of approach before:

(you can cut and paste this entire script into QA/MS and run it with no modifications)

/* -----------------------------------------------------------------
Script by Malcolm Leach 6th Jan 2009, http://www.dbghost.com

Set up test database for demo
----------------------------------------------------------------- */

use master
go
if exists (select 1 from master..sysdatabases where name = 'EliminateDuplicatesDemo')
drop database EliminateDuplicatesDemo
go
create database EliminateDuplicatesDemo
go
use EliminateDuplicatesDemo
go
create table DupTable (id int, col1 varchar(50))
go
insert into DupTable values(1,'some text')
insert into DupTable values(1,'some text')
insert into DupTable values(2,'some text2')
insert into DupTable values(2,'some text2')
-- the ID is a duplicate on the following row but the value of col1 is not...
insert into DupTable values(2,'this is different but with the same id')
insert into DupTable values(3,'some text3')
insert into DupTable values(3,'some text3')
insert into DupTable values(4,'some text4')
go

select * from duptable

begin tran

/* -----------------------------------------------------------------
Extract all rows with duplicate ID's to a temp table
----------------------------------------------------------------- */

-- get all the duplicated ID's first
select id into TempDupIds from DupTable group by id having count(id) > 1

select distinct dt.id, dt.col1
into TempDupTableDupRows
from DupTable dt
inner join TempDupIds di on di.id = dt.id

-- this query will show up any rows that have duplicate IDs but different data
-- these rows will need "special attention"
select id from TempDupTableDupRows group by id having count(id) > 1

/* -----------------------------------------------------------------
Delete all the rows with duplicate ID's from the main table
----------------------------------------------------------------- */

delete from DupTable where id in (select id from TempDupIds)

/* -----------------------------------------------------------------
Put the rows back into the main table
----------------------------------------------------------------- */

insert into DupTable select * from TempDupTableDupRows

commit tran

-- show the cleaned up table (the problem row is still there so no data has been lost)
select * from duptable

Malcolm
DB Ghost - Build, compare and synchronize from source control = Database Change Management for SQL Server
www.dbghost.com
balajimu
balajimu
Forum Newbie
Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)

Group: General Forum Members
Points: 3 Visits: 14
This seems very efficient even for larger tables...... Thanks for the tips.
Jeff Woodard
Jeff Woodard
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 17
This is a nice post - I was not aware that you could use the 'set rowcount' command to limit the number of rows deleted (or inserted or updated for that matter). In digging up a little more detail though, I came across the following note in SQL 2008 BOL. If you're designing code to work in future versions of SQL, you may want to avoid the 'set rowcount' command:

-------------- from SQL2008 BOL ---------------------------------------------------------------
Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server. Do not use SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. Also, for DELETE, INSERT, and UPDATE statements that currently use SET ROWCOUNT, we recommend that you rewrite them to use the TOP syntax. For more information, see DELETE (Transact-SQL), INSERT (Transact-SQL), or UPDATE (Transact-SQL).
-------------- end snippet ----------------------------------------------------------------------
turkbuku
turkbuku
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 23
with Dups as (
SELECT *,row_number() over(partition by Company_name,Address order by Company_name,Address) as RowNum
FROM dbo.companies)
Delete from Dups where rownum > 1;
raymond lew
raymond lew
SSC Journeyman
SSC Journeyman (90 reputation)SSC Journeyman (90 reputation)SSC Journeyman (90 reputation)SSC Journeyman (90 reputation)SSC Journeyman (90 reputation)SSC Journeyman (90 reputation)SSC Journeyman (90 reputation)SSC Journeyman (90 reputation)

Group: General Forum Members
Points: 90 Visits: 123
That's awesome, turkbuku. I've use this construct (rank over blah0 with Oracle's rowid, but rowid is not available in SQL server. But this 'with' contruct works as well as rowid.

Hot damn. Thank you, man.

With a bit of cut and paste...


drop table DupTable
go
create table DupTable (id int, col1 varchar(50))
go
insert into DupTable values(1,'some text')
insert into DupTable values(1,'some text')
insert into DupTable values(2,'some text2')
insert into DupTable values(2,'some text2')
-- the ID is a duplicate on the following row but the value of col1 is not...
insert into DupTable values(2,'this is different but with the same id')
insert into DupTable values(3,'some text3')
insert into DupTable values(3,'some text3')
insert into DupTable values(4,'some text4')
GO

SELECT * FROM duptable;

with Dups as
(SELECT *,row_number() OVER
(partition by id order by id,col1) as RowNum
FROM duptable)

Delete from Dups where rownum > 1;

SELECT * FROM duptable
Malcolm Leach
Malcolm Leach
SSC Veteran
SSC Veteran (213 reputation)SSC Veteran (213 reputation)SSC Veteran (213 reputation)SSC Veteran (213 reputation)SSC Veteran (213 reputation)SSC Veteran (213 reputation)SSC Veteran (213 reputation)SSC Veteran (213 reputation)

Group: General Forum Members
Points: 213 Visits: 71
Cool - love it Smile

Although - it still suffers from the problem of deleting potentially valuable data Wink

this solves it:

with Dups as
( SELECT *,row_number()
OVER
(partition by id,col1 order by id,col1) as RowNum
FROM duptable
)

Malcolm
DB Ghost - Build, compare and synchronize from source control = Database Change Management for SQL Server
www.dbghost.com
raymond lew
raymond lew
SSC Journeyman
SSC Journeyman (90 reputation)SSC Journeyman (90 reputation)SSC Journeyman (90 reputation)SSC Journeyman (90 reputation)SSC Journeyman (90 reputation)SSC Journeyman (90 reputation)SSC Journeyman (90 reputation)SSC Journeyman (90 reputation)

Group: General Forum Members
Points: 90 Visits: 123
Read all about it. Here's the link to 'Common table expressions' aka the 'with...as' construct in MS-T-SQL.

I had trouble googling for 'WITH AS'. Thought I would pass this info on.

[url=http://msdn.microsoft.com/en-us/library/ms175972.aspx][/url]
tzara
tzara
SSC-Enthusiastic
SSC-Enthusiastic (145 reputation)SSC-Enthusiastic (145 reputation)SSC-Enthusiastic (145 reputation)SSC-Enthusiastic (145 reputation)SSC-Enthusiastic (145 reputation)SSC-Enthusiastic (145 reputation)SSC-Enthusiastic (145 reputation)SSC-Enthusiastic (145 reputation)

Group: General Forum Members
Points: 145 Visits: 75
No cursors or temp tables needed.

You can always add a primary key to a table, then remove dups.

1) using CTE statement (examples above).

or

2) delete using a self-join with a derivative table (joining back on the dup fields).

then remove the 'introduced' key, re-establish your colum(s) asserting the desired primary key.

CTE makes this operation a little easier to setup, and easier to read, more trivial. Older platforms can use derivative tables. note: also possible to use a view in a similar manner (e.g. view exposes dups) but why bother?
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search