Find and Remove Duplicate Records SQL Server

Ginger Daniel, 2016-02-01

Having duplicate records in a database is an age-old problem that almost every organization has to deal with. Duplicates can appear because of careless data input, merging records from old systems into new systems, uploading leads from purchased lists, and multiple other reasons.

Identifying these duplicate records can also be tricky. You might have multiple people with the same first and last name. You might have one person with multiple addresses, emails, or other identifying characteristics. In most cases business rules, not repetitive values, will determine what constitutes duplicate data. Knowing your data is the key to determining whether your records are duplicates or not.

It can be a painstaking process, but we will go over some basic steps to help find and remove duplicate records in your database.

Create Duplicates

First let’s take a look at a table and purposefully insert duplicate records into it. I have selected some rows out of a table that contains customer information.

Next I will insert these rows of data into my Customers table to create duplicate rows:

USE ABCompany
GO
INSERT INTO [Customers]
    SELECT TOP 10 *
    FROM [Customers]
GO

And the results are shown below:

Find Duplicates

Duplicate records in your table will most likely not be in sequential order, as shown in our example above. So in order to find duplicates in your table run this query, substituting the names of your database, table, and relevant columns.

USE YourDatabase
GO
SELECT column1, column2, COUNT(column2) as Duplicates
FROM YourTable
GROUP BY column1, column2
HAVING COUNT(column2) > 1

You will need to determine which field(s) in your table will constitute duplicate records. Again, knowing your data and business rules for your organization will determine whether you have duplicate records in your database. In our example below, the results below assume that contactname is the column we are using to determine duplicate records in the table.

Delete Duplicates

Now that we see there are indeed duplicate records in the table, we can delete duplicate rows with this script (again, you will substitute your database, table, and column names):

SET NOCOUNT ON
SET ROWCOUNT 1
WHILE 1 = 1
 BEGIN
   DELETE  
   FROM Customers
   WHERE contactname IN
        (SELECT  contactname
         FROM    Customers
         GROUP BY contactname
         HAVING  COUNT(*) > 1)
      IF @@Rowcount = 0
      BREAK ;
 END
 SET ROWCOUNT 0

To check the results, we run the select statement again to make sure the duplicates are gone:

Conclusion

SQL Server has methods for preventing duplicate records in a database, such as enforcing entity integrity through the use of primary key constraints, unique key constraints, and triggers. However duplicates can occasionally occur because of database design error, or repetitive data that somehow gets past these quality control methods. The techniques described above, in addition to your familiarity with your data, will help you to find and delete duplicate records in your databases.

Resources

Rate

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(53)

Log in or register to rate

You rated this post out of 5. Change rating

Share

Categories

data integrity

Join the discussion and add your comment

Share

Rate

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(53)

Log in or register to rate

You rated this post out of 5. Change rating

Related content

Ensuring Each Client has a Full Set of Key-Value Pairs

by Steve Jones

SQLServerCentral.com

data integrity

In this piece, we find out about a business issue that can occur when using key value pairs in your database to describe information about other entities.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(6)

Log in or register to rate

You rated this post out of 5. Change rating

2018-07-10

608 reads

Discuss

Why is data integrity important?

by Jamin VanderBerg

SQLServerCentral.com

data integrity

Why should I care and why should the database enforce it? This article from Jamin VanderBerg gives some reasons why the database is the place to enforce rules that ensure integrity.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(31)

Log in or register to rate

You rated this post out of 5. Change rating

2010-08-30

4,007 reads

Discuss

Handle Duplicate Records

by Erik Andersen

SQLServerCentral.com

duplicate records

T-SQL script to find duplicates and perform any necessary operations.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(1)

Log in or register to rate

You rated this post out of 5. Change rating

2009-01-07 (first published: 2008-12-26)

2,131 reads

Discuss

Remove Duplicate Records

by Syed Iqbal

SQLServerCentral.com

T-SQL

To remove duplicate records with out using temporary tables.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(9)

Log in or register to rate

You rated this post out of 5. Change rating

2011-06-13 (first published: 2008-04-19)

5,852 reads

Discuss

Missing Temp Tables

by Andy Warren

SQLServerCentral.com

SQL Puzzles

In part one of a two part article, Andy Warren challenges the readers of SQLServerCentral.com to solve a problem involving ADO and SQL Server. Are you up to the challenge?

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

Log in or register to rate

You rated this post out of 5. Change rating

2001-06-01

4,249 reads