Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Deleting Duplicate Records

By Sureshkumar Ramakrishnan, (first published: 2004/12/20)

This article explains a solution which can be used for deleting non-unique duplicate records from a table. In Oracle, every row in a table has a unique identifier so deleting duplicate records can be done as each record is unique based on its rowid (rowid is the mechanism which Oracle uses to physically locate the records in a table). But as SQL Server 7.0/2000 does not have the concept of row identifier it’s difficult to delete duplicate records from a table.

I have created a Script which I have found very useful when I am faced with such a problem. In the example below I have tried to explain how the script works. First let’s create a non indexed table using the script given below:

/*********************************************************/
/* Script for Creation of Employee Table*/
CREATE TABLE [dbo].[Employee] (
            [id] [int] NULL ,
            [name] [Varchar] (50) COLLATE Latin1_General_CI_AS NULL ,
            [salary] [Numeric](18, 2) NULL 
 ) ON [PRIMARY]
GO
/*********************************************************/

Then add some records into the table using the script given below. Note that there are 5 similar records being inserted into the table

/**********************************************************/
/* Script for Insertion of duplicate records to the Table */
Insert into employee values (1,'Ram', 1000.00)
Insert into employee values (1,'Ram', 1000.00)
Insert into employee values (2,'Joe', 2000.00)
Insert into employee values (2,'Joe', 1000.00)
Insert into employee values (3,'Mary', 1000.00)
Insert into employee values (4,'Julie', 5000.00)
Insert into employee values (2,'Joe', 1000.00)
Insert into employee values (1,'Ram', 1000.00)

/**********************************************************/

The table below shows the data present in the ‘Employee’ table.

id Name Salary
1 Ram 1000

1 Ram 1000
2 Joe 2000
2 Joe 1000
3 Mary 1000
4 Julie 5000
2 Joe 1000
1 Ram 1000

The following script deletes the duplicates from ‘Employee’ table. If there are duplicate records then the script will get a count of the duplicate records and will then delete the records till the count is 1.

/*******************************************************************/
/* Script for deletion of Duplicate record from the Employee Table */
Declare @id int,
        @name varchar (50),
        @cnt int,
        @salary numeric

Declare getallrecords cursor local static For
 Select count (1), id, name, salary 
   from employee (nolock)
   group by id, name,salary having count(1)>1
 
Open getallrecords

Fetch next from getallrecords into @cnt,@id,@name,@salary
--Cursor to check with all other records
While @@fetch_status=0
 Begin
  Set @cnt= @cnt-1
  Set rowcount @cnt

  -- Deleting the duplicate records. Observe that all fields are mentioned at the where condition
  Delete from employee where id=@id and name=@name
  and salary=@salary

  Set rowcount 0

  Fetch next from getallrecords into @cnt,@id,@name,@salary
 End 

Close getallrecords
Deallocate getallrecords

*******************************************************************

The logic of the script is pretty simple; the select query retrieves all the records that are duplicates i.e. having Count greater than one. The result set is retrieved by opening a local cursor which fetches one row at a time.

Note here that the Count column is a part of the select query; this is used to identify the no of duplicate rows in the result set.

The row count has been set to (Value obtained from the Count Column – 1). SQL Server uses rowcount to stop processing the query after the specified numbers of rows are returned. The delete statement is executed only to delete the rows set by the Set rowcount command. Once the records have been deleted, the rowcount of SQL server is reset to the default value of 0.

For more details on row count visit (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_set-set_0bjo.asp)

After the above script is executed the data in the ‘Employee’ table is as shown below and note that there are no duplicates anymore:

id Name Salary
1

Ram

1000

2

Joe

1000

2

Joe

2000

3

Mary

1000

4

Julie

5000

In the above example duplicate records were deleted at a row level, as we had considered all the columns in the select query. By customizing the script you can also delete duplicate records at a column level. This Query can be extremely beneficial and time saving for data cleansing during Data Migration.

Total article views: 47066 | Views in the last 30 days: 14
 
Related Articles
SCRIPT

Deleting Duplicate Records

In Datawarehousing, a normal scenario is eleminating duplicate records/rows or deleting duplicate re...

SCRIPT

Deleting Duplicate Records

This article explains a simple solution for the common problem in MSSQL 7.0/2000, finding & deleting...

FORUM

Deleting Duplicate Record in Production

Deleting Duplicate Record in Production

SCRIPT

Yet Another Deleting Duplicate Records (in-place)

Here is another "deleting duplicate records" script, this time in-place, using a uniqueidentifier co...

FORUM

Delete duplicate records in a table

Delete duplicate records in a table

Tags
advanced querying    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones