duplicate records

Question

duplicate records

Tara-1044200

SSCoach

Points: 15951
More actions
June 2, 2009 at 2:21 pm

#134734

How do i find duplicate records in a database of 1.3 TB where in we dont have any PK's ( this might surprise you). I am trying to pull count of duplicate records and measure how much space i can save from it and then if possible put a PK after deleting them.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

TRACEY-320982 SSChampion Points: 13653 More actions · Answer 1

do group and having count(*) > 1...wouldn't worry about space i be more interested in getting rid of duplicates. You can copy out to a temp table do the delete on duplicates and look at space sp_spaceused before and after. Many duplicate script examples just do search..sorry not got example here not on my own laptop.

GURSETHI SSCommitted Points: 1912 More actions · Answer 2

Hi,

Here's a handy query for finding duplicates in a table. Suppose you want to find all email addresses in a table that exist more than once:

SELECT email,

COUNT(email) AS NumOccurrences

FROM users

GROUP BY email

HAVING ( COUNT(email) > 1 )

You could also use this technique to find rows that occur exactly once:

SELECT email

FROM users

GROUP BY email

HAVING ( COUNT(email) = 1 )

Regards

GURSETHI

Dinjaan SSC Enthusiast Points: 146 More actions · Answer 3

THIS IS AN EXAMPLE WHICH I HAVE USED...

I HOPE THIS SCRIPT WILL HELP YOU OUT....

-- Delete duplicate records with out temporary tables.

CREATE TABLE #phonebook (

[phonenumber] [varchar] (30),

[firstname] [varchar] (30),

[lastname] [varchar] (30),

[company] [varchar] (100))

INSERT #phonebook SELECT '902','syed','iqbal','sm soft'

INSERT #phonebook SELECT '905','john','ksjl89','company llc'

-- Duplicate insert 1

INSERT #phonebook SELECT '902','syed','iqbal','sm soft'

INSERT #phonebook SELECT '905','john','ksjl89','company llc'

-- Duplicate insert 2

INSERT #phonebook SELECT '902','syed','iqbal','sm soft'

INSERT #phonebook SELECT '905','john','ksjl89','company llc'

--Show Duplicate Phonenumbers in Phonebook

SELECT phonenumber, COUNT(*) FROM #phonebook

GROUP BY phonenumber HAVING COUNT(*) > 1

ORDER BY COUNT(*) DESC

SET ROWCOUNT 1

SELECT @@rowcount

WHILE @@rowcount > 0

DELETE pb FROM #phonebook as pb

INNER JOIN

(SELECT phonenumber

FROM #phonebook

GROUP BY phonenumber HAVING count(*) > 1)

AS c ON c.phonenumber = pb.phonenumber

SET ROWCOUNT 0

SELECT * FROM #phonebook

DROP TABLE #phonebook

----------ENJOY USING SQL SERVERCENTRAL.COM-------------

FNS SSCrazy Points: 2015 More actions · Answer 4

You need to establish what makes each record unique (ie should every email address or employee number) be unique.

If there's a single column which determins this then use something like

SELECT emailaddress, COUNT(emailaddress)

FROM MyTable

GROUP BY emailaddress

HAVING COUNT(emailaddress) > 1

Deleting can be done a number of ways depending on how many records are there. DELETE TOP 1, ROWCOUNT, temporary tables etc..

If you need multiple columns to detmine your uniqueness for each record then it gets more complicated and I have been known to resort to a (grits teeth) CURSOR.

Pandian S SSChasing Mays Points: 625 More actions · Answer 5

Create Table #Data

(

FirstNameVARCHAR(50),

EMailVARCHAR(50)

)

GO

Insert #Data Values('AAAAA','AAAAA@a.com')

Insert #Data Values('BBBBB','BBBBB@a.com')

Insert #Data Values('AAAAA','AAAAA@a.com')

Insert #Data Values('CCCCC','CCCCC@a.com')

Insert #Data Values('AAAAA','AAAAA@a.com')

Insert #Data Values('CCCCC','CCCCC@a.com')

GO

Select FirstName,EMail From #Data

GO

SELECT FirstName,EMail,

CASE WHEN DENSE_RANK() OVER(PARTITION BY EMail ORDER BY NEWID()) >1 THEN 'Duplicate' ELSE 'Non Duplicate' END 'Status'

FROM #Data

GO

Drop Table #Data

Regards

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 6

1) the size of the database is actually irrelevant. You could have a 10TB database and a 2 row table that you are looking for dupes in. 🙂

2) eliminating duplicates on a table without some form of 'pointer' such as a PK or sequence is going to be horribly inefficient. Best of luck with that! :w00t:

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

SQLQuest29 SSCrazy Eights Points: 8264 More actions · Answer 7

Tara (6/2/2009)
How do i find duplicate records in a database of 1.3 TB where in we dont have any PK's ( this might surprise you). I am trying to pull count of duplicate records and measure how much space i can save from it and then if possible put a PK after deleting them.

Hi,

This will help you

http://www.sqlservercentral.com/articles/duplicates/65916/

http://chiragrdarji.wordpress.com/2007/07/23/delete-single-row-from-duplicate-rows-in-sql-server-2005-and-2000/

This will solve your problems !!! 🙂

Hope this helps,

\\K

______________________________________________________________________________________________________________________________________________________________________________________
HTH !
Kin
MCTS : 2005, 2008
Active SQL Server Community Contributor 🙂