Finding duplicate entries

  • I am not sure when this was posted or wether this got solved. happend to see one thread while browsing. But I havent found any one suggesting having clause. I always use this to find the duplcate records. if the problem is still existing use thsis.

    select

      fld1,

      fld2,

      fld3

    from

     tbl

    group by

     fld1,

     fld2,

     fld3

    having

     Count(*) > 1

    Jeswanth

     

     

    --------------------------------

  • And if you modify it slightly, adding a qty field, then you'll get a count of how many duplicate records there are for each group.

    select

      fld1,

      fld2,

      fld3, count(*) as qty

    from

     tbl

    group by

     fld1,

     fld2,

     fld3

    having

     Count(*) > 1

  • Thank you, I will give it a try.

  • Hi

    There is a nice wizard in MS Access allowing you to find duplicates in a table. After a little modification it works on SQL Server as well.

    Here is what I have made:

    SELECT Test.aTime AS [aTime Field], Test.avalue AS [avalue Field], Test.company AS [company Field], Count(Test.aTime) AS NumberOfDups

    FROM Test

    GROUP BY Test.aTime, Test.avalue, Test.company

    HAVING (((Count(Test.aTime))>1) AND ((Count(Test.company))>1));

    Result:

    aTime Field avalue Field company Field NumberOfDups

    2005-04-02 00:00:00.000bu002 ABC Company2

    Give it a try and let me know if this is what you are looking for.

    Rene

    By the way: Have used the create and insert statement from Mike to have a table with some data in it.

  • Or plain vanilla:

    select *, count(company) "kaching!" from test

    group by aTime, aValue, Company

    Having count(company) > 1 kaching

    order by count(company) desc

    RESULT:

    aTime                            avalue     company                  kaching!    

    ------------------------   ---------  -------------------- -------

    2005-04-02 00:00:00.000  bu002      ABC Company            2

    Cheers,

    DB


    The systems fine with no users loggged in. Can we keep it that way ?br>

  • Very interesting solution here... elegantly simple, but perfect.  

    If I may, I am wondering how your query could be modified to enable me to see each of the duplicate entries in order to determine visually which one to delete?

    Also, I need to determine that each student in a table has only one student ID number .. .by searching the lastname, firstname and birthdate fieldls for duplicates where the student ID number is not duplicated... thus identifying a student who had managed to get two or more numbers, could I do that?

    Finally, I need to find any students where more than one has the same student ID number.. for example, they have different firstname, lastname and birthdates and are obviously different kids, but the same id number.

    A brain buster for me... I'm hoping you have a thought! 

    Thanks in advance for your help!

     

     

  • Tom - this is an old post now - you'd be much better off starting a new thread with your questions - you'll get much wider readership that way since this post only comes to the notice of those who'd already participated in it and thus continue getting emails (if they haven't "unsubscribed" from it, i.e.)..







    **ASCII stupid question, get a stupid ANSI !!!**

  • Tom, ya got lucky... as Sushila stated, most don't read down a thread this deep especially when Sushila and Mike are having a cyber reunion

    Anyway, here's a self contained example of how to do just what you ask... it does require some unique row identifier, though.  Obviously, the columns you're looking at for dupes don't qualify as a unique identifier or a primary key because, well, they've been duplicated

    --===== This temp table will simulate your real table

         IF OBJECT_ID('TempDB..#yourtable') IS NOT NULL

            DROP TABLE #yourtable

     CREATE TABLE #yourtable

            (

            ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

            Date DATETIME,

            Unit VARCHAR(10),

            Company_Name VARCHAR(20)

            )

    --===== Loading the temp table with your example data

     INSERT INTO #yourtable

            (Date,Unit,Company_Name)

    SELECT '04/02/05','bu002','ABC Company' UNION ALL

    SELECT '11/30/04','bu002','ABC Company' UNION ALL

    SELECT '04/02/05','bu002','ABC Company' UNION ALL

    SELECT '05/12/05','bu002','ABC Company' UNION ALL

    SELECT '11/30/04','bu002','ABC Company'

    --===== List whole rows of all duplicates

     SELECT t3.*

       FROM #yourtable t3,

            (--Derived "d" table finds ID's of all duplicated records

             SELECT DISTINCT t1.ID

               FROM #yourtable t1,

                    #yourtable t2

              WHERE t1.Date = t2.Date

                AND t1.Unit = t2.Unit

                AND t1.Company_Name = t2.Company_Name

                AND t1.ID <> t2.ID

            )d --End derived table "d"

      WHERE t3.ID = d.ID

      ORDER BY t3.Company_Name,

               t3.Unit,

               t3.Date

    I have to tell you that most would look at this and think that it's incredibly slow... not so.  It'll find 400 dupes in a million row table in about a minute on a lowly 1.8Ghz single CPU box.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • An alternative, which may or may not be faster, I don't know. But it doesn't require a unique id.

     

      SELECT distinct t.date, t.Unit, t.Company_Name /* or simply select t.*  */

       FROM #yourtable t inner join

            (--Derived "d" table finds all duplicated records

               SELECT Date, Unit, Company_Name

               from #yourtable

               group by Date, Unit, Company_Name

               having count(*) > 1

            ) d

       on t.Date = d.Date

       AND t.Unit = d.Unit

       AND t.Company_Name = d.Company_Name

      ORDER BY t.Company_Name,

               t.Unit,

               t.Date

  • Jeff - don't forget to tell Tom that he also managed to benefit from your expertise at a time when you were more relaxed than usual and not biting peoples' heads off...don't know about vladan but you sure scared me....

    Just kidding Jeff...couldn't resist it!







    **ASCII stupid question, get a stupid ANSI !!!**

  • No Problem ... my head is very large... so beware the choking hazard!!!!

    LOL 

     

     

  • This seems to work exactly right when I ran your test, but when I tried plugging my table/field names in ... I ran into trouble...

    My field names are Permnum,Lastname,Firstname,Birthdate

    Table name is "ASTUALL"

    Here is my modification of your code..

     SELECT *

       FROM #astuall t3,

            (--Derived "d" table finds ID's of all duplicated records

             SELECT DISTINCT t1.ID

               FROM #astuall t1,

                    #astuall t2

              WHERE t1.Firstname = t2.Firstname

                AND t1.Lastname = t2.Lastname

                AND t1.Birthdate = t2.Birthdate

                AND t1.Permnum <> t2.Permnum

            )d --End derived table "d"

      WHERE t3.ID = d.ID

      ORDER BY t3.Permnum,

               t3.Lastname,

               t3.Firstname

    And the error message I got..

    (5 row(s) affected)

    Server: Msg 208, Level 16, State 1, Line 23

    Invalid object name '#astuall'.

    Server: Msg 208, Level 16, State 1, Line 23

    Invalid object name '#astuall'.

    Server: Msg 208, Level 16, State 1, Line 23

    Invalid object name '#astuall'.

    thanks!

     

     

     

  • Isn't this the exact same question as on your other thread ? Can we let this old thread rest in peace now, and continue the discusssion on your new thread ?

Viewing 13 posts - 31 through 42 (of 42 total)

You must be logged in to reply to this topic. Login to reply