duplicate records

  • How do i find duplicate records in a database of 1.3 TB where in we dont have any PK's ( this might surprise you). I am trying to pull count of duplicate records and measure how much space i can save from it and then if possible put a PK after deleting them.

  • do group and having count(*) > 1...wouldn't worry about space i be more interested in getting rid of duplicates. You can copy out to a temp table do the delete on duplicates and look at space sp_spaceused before and after. Many duplicate script examples just do search..sorry not got example here not on my own laptop.

  • Hi,

    Here's a handy query for finding duplicates in a table. Suppose you want to find all email addresses in a table that exist more than once:

    SELECT email,

    COUNT(email) AS NumOccurrences

    FROM users

    GROUP BY email

    HAVING ( COUNT(email) > 1 )

    You could also use this technique to find rows that occur exactly once:

    SELECT email

    FROM users

    GROUP BY email

    HAVING ( COUNT(email) = 1 )

    Regards

    GURSETHI

  • THIS IS AN EXAMPLE WHICH I HAVE USED...

    I HOPE THIS SCRIPT WILL HELP YOU OUT....

    -- Delete duplicate records with out temporary tables.

    CREATE TABLE #phonebook (

    [phonenumber] [varchar] (30),

    [firstname] [varchar] (30),

    [lastname] [varchar] (30),

    [company] [varchar] (100))

    INSERT #phonebook SELECT '902','syed','iqbal','sm soft'

    INSERT #phonebook SELECT '905','john','ksjl89','company llc'

    -- Duplicate insert 1

    INSERT #phonebook SELECT '902','syed','iqbal','sm soft'

    INSERT #phonebook SELECT '905','john','ksjl89','company llc'

    -- Duplicate insert 2

    INSERT #phonebook SELECT '902','syed','iqbal','sm soft'

    INSERT #phonebook SELECT '905','john','ksjl89','company llc'

    --Show Duplicate Phonenumbers in Phonebook

    SELECT phonenumber, COUNT(*) FROM #phonebook

    GROUP BY phonenumber HAVING COUNT(*) > 1

    ORDER BY COUNT(*) DESC

    SET ROWCOUNT 1

    SELECT @@rowcount

    WHILE @@rowcount > 0

    DELETE pb FROM #phonebook as pb

    INNER JOIN

    (SELECT phonenumber

    FROM #phonebook

    GROUP BY phonenumber HAVING count(*) > 1)

    AS c ON c.phonenumber = pb.phonenumber

    SET ROWCOUNT 0

    SELECT * FROM #phonebook

    DROP TABLE #phonebook

    ----------ENJOY USING SQL SERVERCENTRAL.COM-------------

  • You need to establish what makes each record unique (ie should every email address or employee number) be unique.

    If there's a single column which determins this then use something like

    SELECT emailaddress, COUNT(emailaddress)

    FROM MyTable

    GROUP BY emailaddress

    HAVING COUNT(emailaddress) > 1

    Deleting can be done a number of ways depending on how many records are there. DELETE TOP 1, ROWCOUNT, temporary tables etc..

    If you need multiple columns to detmine your uniqueness for each record then it gets more complicated and I have been known to resort to a (grits teeth) CURSOR.

  • Create Table #Data

    (

    FirstNameVARCHAR(50),

    EMailVARCHAR(50)

    )

    GO

    Insert #Data Values('AAAAA','AAAAA@a.com')

    Insert #Data Values('BBBBB','BBBBB@a.com')

    Insert #Data Values('AAAAA','AAAAA@a.com')

    Insert #Data Values('CCCCC','CCCCC@a.com')

    Insert #Data Values('AAAAA','AAAAA@a.com')

    Insert #Data Values('CCCCC','CCCCC@a.com')

    GO

    Select FirstName,EMail From #Data

    GO

    SELECT FirstName,EMail,

    CASE WHEN DENSE_RANK() OVER(PARTITION BY EMail ORDER BY NEWID()) >1 THEN 'Duplicate' ELSE 'Non Duplicate' END 'Status'

    FROM #Data

    GO

    Drop Table #Data

    Regards

  • 1) the size of the database is actually irrelevant. You could have a 10TB database and a 2 row table that you are looking for dupes in. 🙂

    2) eliminating duplicates on a table without some form of 'pointer' such as a PK or sequence is going to be horribly inefficient. Best of luck with that! :w00t:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Tara (6/2/2009)


    How do i find duplicate records in a database of 1.3 TB where in we dont have any PK's ( this might surprise you). I am trying to pull count of duplicate records and measure how much space i can save from it and then if possible put a PK after deleting them.

    Hi,

    This will help you

    http://www.sqlservercentral.com/articles/duplicates/65916/

    http://chiragrdarji.wordpress.com/2007/07/23/delete-single-row-from-duplicate-rows-in-sql-server-2005-and-2000/

    This will solve your problems !!! 🙂

    Hope this helps,

    \\K

    ______________________________________________________________________________________________________________________________________________________________________________________
    HTH !
    Kin
    MCTS : 2005, 2008
    Active SQL Server Community Contributor 🙂

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply