SQL Server Central is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
Search:  
 
 

Deleting Duplicate Records

By Sureshkumar Ramakrishnan, 2004/12/20

Total article views: 44370 | Views in the last 30 days: 202

This article explains a solution which can be used for deleting non-unique duplicate records from a table. In Oracle, every row in a table has a unique identifier so deleting duplicate records can be done as each record is unique based on its rowid (rowid is the mechanism which Oracle uses to physically locate the records in a table). But as SQL Server 7.0/2000 does not have the concept of row identifier it’s difficult to delete duplicate records from a table.

I have created a Script which I have found very useful when I am faced with such a problem. In the example below I have tried to explain how the script works. First let’s create a non indexed table using the script given below:

/*********************************************************/
/* Script for Creation of Employee Table*/
CREATE TABLE [dbo].[Employee] (
            [id] [int] NULL ,
            [name] [Varchar] (50) COLLATE Latin1_General_CI_AS NULL ,
            [salary] [Numeric](18, 2) NULL 
 ) ON [PRIMARY]
GO
/*********************************************************/

Then add some records into the table using the script given below. Note that there are 5 similar records being inserted into the table

/**********************************************************/
/* Script for Insertion of duplicate records to the Table */
Insert into employee values (1,'Ram', 1000.00)
Insert into employee values (1,'Ram', 1000.00)
Insert into employee values (2,'Joe', 2000.00)
Insert into employee values (2,'Joe', 1000.00)
Insert into employee values (3,'Mary', 1000.00)
Insert into employee values (4,'Julie', 5000.00)
Insert into employee values (2,'Joe', 1000.00)
Insert into employee values (1,'Ram', 1000.00)

/**********************************************************/

The table below shows the data present in the ‘Employee’ table.

id Name Salary
1 Ram 1000

1 Ram 1000
2 Joe 2000
2 Joe 1000
3 Mary 1000
4 Julie 5000
2 Joe 1000
1 Ram 1000

The following script deletes the duplicates from ‘Employee’ table. If there are duplicate records then the script will get a count of the duplicate records and will then delete the records till the count is 1.

/*******************************************************************/
/* Script for deletion of Duplicate record from the Employee Table */
Declare @id int,
        @name varchar (50),
        @cnt int,
        @salary numeric

Declare getallrecords cursor local static For
 Select count (1), id, name, salary 
   from employee (nolock)
   group by id, name,salary having count(1)>1
 
Open getallrecords

Fetch next from getallrecords into @cnt,@id,@name,@salary
--Cursor to check with all other records
While @@fetch_status=0
 Begin
  Set @cnt= @cnt-1
  Set rowcount @cnt

  -- Deleting the duplicate records. Observe that all fields are mentioned at the where condition
  Delete from employee where id=@id and name=@name
  and salary=@salary

  Set rowcount 0

  Fetch next from getallrecords into @cnt,@id,@name,@salary
 End 

Close getallrecords
Deallocate getallrecords

*******************************************************************

The logic of the script is pretty simple; the select query retrieves all the records that are duplicates i.e. having Count greater than one. The result set is retrieved by opening a local cursor which fetches one row at a time.

Note here that the Count column is a part of the select query; this is used to identify the no of duplicate rows in the result set.

The row count has been set to (Value obtained from the Count Column – 1). SQL Server uses rowcount to stop processing the query after the specified numbers of rows are returned. The delete statement is executed only to delete the rows set by the Set rowcount command. Once the records have been deleted, the rowcount of SQL server is reset to the default value of 0.

For more details on row count visit (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_set-set_0bjo.asp)

After the above script is executed the data in the ‘Employee’ table is as shown below and note that there are no duplicates anymore:

id Name Salary
1

Ram

1000

2

Joe

1000

2

Joe

2000

3

Mary

1000

4

Julie

5000

In the above example duplicate records were deleted at a row level, as we had considered all the columns in the select query. By customizing the script you can also delete duplicate records at a column level. This Query can be extremely beneficial and time saving for data cleansing during Data Migration.

By Sureshkumar Ramakrishnan, 2004/12/20

Total article views: 44370 | Views in the last 30 days: 202
Your response
 
 
Related tags

Advanced Querying    
T-SQL    
 
Like this? Try these...

Outer Join Trouble

By Steve Jones | Category: SQL Puzzles
| 17,280 reads
Already registered?  

Free registration required

To read the rest of this article, and access thousands of other articles, we ask you to register on the site and subscribe to our newsletters.

Register

E-mail address:
Password:
Password (confirm):

  

Subscriptions

We ask you to register on the site and subscribe to our newsletters. Subscribing to our newsletters gets you:

  • ALL of our content (thousands of articles, scripts, and forum postings)
  • A daily newsletter (example)
  • A weekly news round up (example)
  • The opportunity to ask and answer questions in our forums
  • A daily Question of the Day to test and help you increase your knowledge of SQL Server.

We ask that you give the newsletter a try for a week. Over 200,000 SQL Server Professionals a day find it entertaining and useful. If not, you are welcome to unsubscribe at anytime.

Steve Jones
Editor, SQLServerCentral.com