Display Duplicates

  • I have a Contact db.

    It has the usual Columns:

    Company, FirstName, LastName, Address, City, ST, etc..

    When I SELECT DISTINCT on Company, I get 27,017 records returned, but the Entire Listing is 28,722 records.

    The difference would seem to be Duplicates, right?

    But When I run :

    SELECT COMPANY

    FROM [CPACONTACTS].[dbo].[CPACONTACTS2]

    GROUP BY COMPANY

    HAVING COUNT(*) > 1

    I get 741 records returned. I get 0 records returned when displaying Null Values. 27,017 + 741 = 27,758 records. I am still missing 964 from some where (28,722-27,758), but I have no idea why.

    Any Clues?

    thanks

    Spatio

  • clue....

    SELECT COMPANY, COUNT(*)

    FROM [CPACONTACTS].[dbo].[CPACONTACTS2]

    GROUP BY COMPANY

    ORDER BY COUNT(*) DESC

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Try this:

    SELECT c1.*

    FROM [CPACONTACTS].[dbo].[CPACONTACTS2] c1

    INNER JOIN (SELECT COMPANY

    FROM [CPACONTACTS].[dbo].[CPACONTACTS2]

    GROUP BY COMPANY

    HAVING COUNT(*) > 1) c2

    ON c1.COMPANY = c2.COMPANY

    ORDER BY c1.COMPANY

    EDIT: added ORDER BY clause to my original code. Makes it easier to see the dups.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • At a guess, you have 741 duplicated rows, 73 triplicated, and 1 quadruplicated.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (7/23/2014)


    At a guess, you have 741 duplicated rows, 73 triplicated, and 1 quadruplicated.

    Give the man a cookie for doing the actual math work! πŸ˜€

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (7/23/2014)


    ChrisM@Work (7/23/2014)


    At a guess, you have 741 duplicated rows, 73 triplicated, and 1 quadruplicated.

    Give the man a cookie for doing the actual math work! πŸ˜€

    No, he didn't do the maths work, it was a guess, just as he said. His figures account for 890 extra records, not for 1705. I suspect he was just encouraging the OP (who apparently didn't know that 2 is not the only integer greater than 1) to try the code suggested by you or the code suggested by JL or something like

    SELECT COMPANY, COUNT(*)

    FROM [CPACONTACTS].[dbo].[CPACONTACTS2]

    GROUP BY COMPANY HAVING COUNT(*) > 1

    ORDER BY COUNT(*) DESC

    which is what I would run in the OP's position.

    Tom

  • TomThomson (7/23/2014)


    Brandie Tarvin (7/23/2014)


    ChrisM@Work (7/23/2014)


    At a guess, you have 741 duplicated rows, 73 triplicated, and 1 quadruplicated.

    Give the man a cookie for doing the actual math work! πŸ˜€

    No, he didn't do the maths work, it was a guess, just as he said. His figures account for 890 extra records, not for 1705. I suspect he was just encouraging the OP (who apparently didn't know that 2 is not the only integer greater than 1) to try the code suggested by you or the code suggested by JL or something like

    SELECT COMPANY, COUNT(*)

    FROM [CPACONTACTS].[dbo].[CPACONTACTS2]

    GROUP BY COMPANY HAVING COUNT(*) > 1

    ORDER BY COUNT(*) DESC

    which is what I would run in the OP's position.

    Exactly - and to reinforce the point that the 741 rows returned by the original dupe-detecting query are not singletons. One of the solutions would be 547 duplicates, 165 triplicates and 29 quadruplicates. So I'll take the cookie anyway πŸ™‚ thanks Brandie.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Why are duplicate companies a problem? I would expect at least a few cases where you have more than one contact at a particular company.

  • Chris Wooding (7/24/2014)


    Why are duplicate companies a problem? I would expect at least a few cases where you have more than one contact at a particular company.

    Or a situation where contacts change due to promotions, firings, or just regular employee churn.

    Definitely a good question to consider.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply