SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Delete duplicate records in a table


Delete duplicate records in a table

Author
Message
chitturiii
chitturiii
Forum Newbie
Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)

Group: General Forum Members
Points: 7 Visits: 14
Hi,

Can you please suggest me the best query to delete the duplicate records in a table with large number of records.

Thanks in advance.

--Chandra
VSSGeorge
VSSGeorge
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1245 Visits: 1549
Hi ..

Try this one...


declare @tab table (id int,name varchar(10))
insert into @tab (id,name)
select 1,'Sumit' union all
select 1,'Sumit' union all
select 2,'Sumit2' union all
select 2,'Sumit2'

select id,name from ( select row_number() over (partition by id,name order by id )as row ,id,name from @tab )as t
where row= 1
ChrisM@Work
ChrisM@Work
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21907 Visits: 19709
chitturiii (12/1/2010)
Hi,

Can you please suggest me the best query to delete the duplicate records in a table with large number of records.

Thanks in advance.

--Chandra


You may have to delete in batches if there are a very large number of rows to be deleted. Do you have a query which identifies the rows to delete? If not, post the structure of the table as CREATE TABLE etc and any indexes, also which columns are used to identify dupes.

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
PKDC
PKDC
SSC Veteran
SSC Veteran (208 reputation)SSC Veteran (208 reputation)SSC Veteran (208 reputation)SSC Veteran (208 reputation)SSC Veteran (208 reputation)SSC Veteran (208 reputation)SSC Veteran (208 reputation)SSC Veteran (208 reputation)

Group: General Forum Members
Points: 208 Visits: 450
If you want to filter duplicate, just use group by clause to filter duplicate row.

select column_name, min(PK_Column_name) PrimaryKeyColumn from tablename group by column_name having count(*)>1

The above script will identify duplicate value and show minimum Primary key value. If you intend to keep one row, you can keep the minimum Primary key value row. (of course you can change the condition)

The below will delete all duplicate rows and leaving one distinct value row(minimum Primary key value row)

with Tempinfo (columename, PrimaryKeyColumn)
as
(select column_name, min(PK_Column_name) PrimaryKeyColumn from tablename group by column_name having count(*)>1)

delete a
from tablename a join Tempinfo b on a.column_name=b.column_name and a.PrimaryKeyColumn<>b.PrimaryKeyColumn
Sachin Nandanwar
Sachin Nandanwar
SSC Eights!
SSC Eights! (861 reputation)SSC Eights! (861 reputation)SSC Eights! (861 reputation)SSC Eights! (861 reputation)SSC Eights! (861 reputation)SSC Eights! (861 reputation)SSC Eights! (861 reputation)SSC Eights! (861 reputation)

Group: General Forum Members
Points: 861 Visits: 2633
Also it is a better idea to put the recover model to SIMPLE or else your log file will bloat.But make sure you take a Full and a T log backup before changing the recovery model to SIMPLE.

--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (116K reputation)SSC Guru (116K reputation)SSC Guru (116K reputation)SSC Guru (116K reputation)SSC Guru (116K reputation)SSC Guru (116K reputation)SSC Guru (116K reputation)SSC Guru (116K reputation)

Group: General Forum Members
Points: 116569 Visits: 41430
Sachin Nandanwar (12/3/2010)
Also it is a better idea to put the recover model to SIMPLE or else your log file will bloat.But make sure you take a Full and a T log backup before changing the recovery model to SIMPLE.


Agh... be careful now. Changing to SIMPLE breaks the backup chain. This is normally one of the worst things you can do.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Sachin Nandanwar
Sachin Nandanwar
SSC Eights!
SSC Eights! (861 reputation)SSC Eights! (861 reputation)SSC Eights! (861 reputation)SSC Eights! (861 reputation)SSC Eights! (861 reputation)SSC Eights! (861 reputation)SSC Eights! (861 reputation)SSC Eights! (861 reputation)

Group: General Forum Members
Points: 861 Visits: 2633
Jeff Moden (12/4/2010)
Sachin Nandanwar (12/3/2010)
Also it is a better idea to put the recover model to SIMPLE or else your log file will bloat.But make sure you take a Full and a T log backup before changing the recovery model to SIMPLE.


Agh... be careful now. Changing to SIMPLE breaks the backup chain. This is normally one of the worst things you can do.


But that's the reason I suggested to take a FULL backup before changing it to SIMPLE.Then delete the records.Change it to FULL.Again take a Full Backup.
Correct me if I am missing something.

--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
Hardy21
Hardy21
SSCommitted
SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)SSCommitted (1.9K reputation)

Group: General Forum Members
Points: 1866 Visits: 1399
Sachin Nandanwar (12/7/2010)
Jeff Moden (12/4/2010)
Sachin Nandanwar (12/3/2010)
Also it is a better idea to put the recover model to SIMPLE or else your log file will bloat.But make sure you take a Full and a T log backup before changing the recovery model to SIMPLE.


Agh... be careful now. Changing to SIMPLE breaks the backup chain. This is normally one of the worst things you can do.


But that's the reason I suggested to take a FULL backup before changing it to SIMPLE.Then delete the records.Change it to FULL.Again take a Full Backup.
Correct me if I am missing something.

Execute the query inside the Begin transaction - End Transaction, If you satisfy with the result then commit it otherwise rollback it. So, it will not change anything in your database in case of any error or issue. Take backup & restore require more time.

Once you change the backup type, your backup chain is no longer valid. You need to take FULL backup once again and do the differential/transaction log backup as per your backup strategy.

Thanks
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (116K reputation)SSC Guru (116K reputation)SSC Guru (116K reputation)SSC Guru (116K reputation)SSC Guru (116K reputation)SSC Guru (116K reputation)SSC Guru (116K reputation)SSC Guru (116K reputation)

Group: General Forum Members
Points: 116569 Visits: 41430
Hardy21 (12/7/2010)
Sachin Nandanwar (12/7/2010)
Jeff Moden (12/4/2010)
Sachin Nandanwar (12/3/2010)
Also it is a better idea to put the recover model to SIMPLE or else your log file will bloat.But make sure you take a Full and a T log backup before changing the recovery model to SIMPLE.


Agh... be careful now. Changing to SIMPLE breaks the backup chain. This is normally one of the worst things you can do.


But that's the reason I suggested to take a FULL backup before changing it to SIMPLE.Then delete the records.Change it to FULL.Again take a Full Backup.
Correct me if I am missing something.

Execute the query inside the Begin transaction - End Transaction, If you satisfy with the result then commit it otherwise rollback it. So, it will not change anything in your database in case of any error or issue. Take backup & restore require more time.

Once you change the backup type, your backup chain is no longer valid. You need to take FULL backup once again and do the differential/transaction log backup as per your backup strategy.


The whole point is to try to avoid sending the LOG file through the roof with a bazillion deletes and to try to accelerate the rate of the deletions. Simply adding an explicit transaction will have neither of those effects.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search