Showing all records that appear more than once

  • Hi everyone
    I'm trying to write some SQL that will allow me to pull all records where the NUMBER column appears more than once. I need to see all the other columns too, whatever they contain e.g.
    NUMBER  COURSE  START  AWARD  GRADE    TYPE
    1                Biology      1988      BA          1st            Course
    1                Biology      1988      BA          1st            Course
    2                Maths        1975      MA          2:1           Course
    3                English      1987      BA          2:1            Course
    3                                  1987                                      Course
    So, in this example I would expect only NUMBERs 1 and 3 to be returned with the additional COURSE, START, AWARD and GRADE columns also showing. The idea behind this is to ensure that I can see all database records that have either duplicate information in them (e.g. NUMBER 1), or where part of the information is duplicated e.g. NUMBER 3.
    Any help would be much appreciated.
    Many thanks
    Jon
  • Jon

    Use COUNT(*) with a PARTITION BY clause to show the number of rows per NUMBER.  If you want to return only those rows with NUMBER greater than 1, you'll need to use a CTE or subquery, since you can't have COUNT(*) in your WHERE clause.

    John

  • OK, thanks John.

    I think that's above my SQL knowledge unfortunately. I'll do it the manual way 🙂

    Many thanks
    Jon

  • j.clay 47557 - Wednesday, October 10, 2018 2:48 AM

    OK, thanks John.

    I think that's above my SQL knowledge unfortunately. I'll do it the manual way 🙂

    Many thanks
    Jon

    Take the time to learn something new because you're going to need to do it again. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Very true and I will definitely learn. Deadlines are looming this week unfortunately 🙁


  • SELECT tn.*
    FROM (
        SELECT NUMBER
        FROM dbo.table_name
        GROUP BY NUMBER
        HAVING COUNT(*) > 1
    ) AS NUMBER_DUPS
    INNER JOIN dbo.table_name tn ON tn.NUMBER = NUMBER_DUPS.NUMBER
    ORDER BY NUMBER, COURSE

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Wednesday, October 10, 2018 11:13 AM


    SELECT tn.*
    FROM (
        SELECT NUMBER
        FROM dbo.table_name
        GROUP BY NUMBER
        HAVING COUNT(*) > 1
    ) AS NUMBER_DUPS
    INNER JOIN dbo.table_name tn ON tn.NUMBER = NUMBER_DUPS.NUMBER
    ORDER BY NUMBER, COURSE

    I suspect the following will perform better, but it's hard to test with no sample data.

    WITH Counts AS
    (
     SELECT *, COUNT(*) OVER(PARTITION BY Number) AS cnt
     FROM dbo.table_name
    )
    SELECT *
    FROM Counts
    WHERE cnt > 1

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Wednesday, October 10, 2018 12:02 PM

    ScottPletcher - Wednesday, October 10, 2018 11:13 AM


    SELECT tn.*
    FROM (
        SELECT NUMBER
        FROM dbo.table_name
        GROUP BY NUMBER
        HAVING COUNT(*) > 1
    ) AS NUMBER_DUPS
    INNER JOIN dbo.table_name tn ON tn.NUMBER = NUMBER_DUPS.NUMBER
    ORDER BY NUMBER, COURSE

    I suspect the following will perform better, but it's hard to test with no sample data.

    WITH Counts AS
    (
     SELECT *, COUNT(*) OVER(PARTITION BY Number) AS cnt
     FROM dbo.table_name
    )
    SELECT *
    FROM Counts
    WHERE cnt > 1

    Drew

    But then you don't list the first row of every NUMBER, right?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Wednesday, October 10, 2018 12:56 PM

    drew.allen - Wednesday, October 10, 2018 12:02 PM

    ScottPletcher - Wednesday, October 10, 2018 11:13 AM


    SELECT tn.*
    FROM (
        SELECT NUMBER
        FROM dbo.table_name
        GROUP BY NUMBER
        HAVING COUNT(*) > 1
    ) AS NUMBER_DUPS
    INNER JOIN dbo.table_name tn ON tn.NUMBER = NUMBER_DUPS.NUMBER
    ORDER BY NUMBER, COURSE

    I suspect the following will perform better, but it's hard to test with no sample data.

    WITH Counts AS
    (
     SELECT *, COUNT(*) OVER(PARTITION BY Number) AS cnt
     FROM dbo.table_name
    )
    SELECT *
    FROM Counts
    WHERE cnt > 1

    Drew

    But then you don't list the first row of every NUMBER, right?

    Again, this is why we ask for expected results.  You would simply add a ROW_NUMBER to the CTE and select based on the row number = 1.  It will still probably perform faster than the self join.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • ScottPletcher - Wednesday, October 10, 2018 12:56 PM

    drew.allen - Wednesday, October 10, 2018 12:02 PM

    ScottPletcher - Wednesday, October 10, 2018 11:13 AM


    SELECT tn.*
    FROM (
        SELECT NUMBER
        FROM dbo.table_name
        GROUP BY NUMBER
        HAVING COUNT(*) > 1
    ) AS NUMBER_DUPS
    INNER JOIN dbo.table_name tn ON tn.NUMBER = NUMBER_DUPS.NUMBER
    ORDER BY NUMBER, COURSE

    I suspect the following will perform better, but it's hard to test with no sample data.

    WITH Counts AS
    (
     SELECT *, COUNT(*) OVER(PARTITION BY Number) AS cnt
     FROM dbo.table_name
    )
    SELECT *
    FROM Counts
    WHERE cnt > 1

    Drew

    But then you don't list the first row of every NUMBER, right?

    Sorry, I looked back at the OP and realized that you were asking something else.

    Yes, it does include the first row.  A frame is only applied to a windowed function if there is an ORDER BY clause.  Since there was no ORDER BY clause specified in the windowed function, the COUNT is for the entire partition, so the count will be the same for all records in the partition (including the first).  This query will only exclude records where there is exactly one record in the partition.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Thursday, October 11, 2018 8:26 AM

    ScottPletcher - Wednesday, October 10, 2018 12:56 PM

    drew.allen - Wednesday, October 10, 2018 12:02 PM

    ScottPletcher - Wednesday, October 10, 2018 11:13 AM


    SELECT tn.*
    FROM (
        SELECT NUMBER
        FROM dbo.table_name
        GROUP BY NUMBER
        HAVING COUNT(*) > 1
    ) AS NUMBER_DUPS
    INNER JOIN dbo.table_name tn ON tn.NUMBER = NUMBER_DUPS.NUMBER
    ORDER BY NUMBER, COURSE

    I suspect the following will perform better, but it's hard to test with no sample data.

    WITH Counts AS
    (
     SELECT *, COUNT(*) OVER(PARTITION BY Number) AS cnt
     FROM dbo.table_name
    )
    SELECT *
    FROM Counts
    WHERE cnt > 1

    Drew

    But then you don't list the first row of every NUMBER, right?

    Sorry, I looked back at the OP and realized that you were asking something else.

    Yes, it does include the first row.  A frame is only applied to a windowed function if there is an ORDER BY clause.  Since there was no ORDER BY clause specified in the windowed function, the COUNT is for the entire partition, so the count will be the same for all records in the partition (including the first).  This query will only exclude records where there is exactly one record in the partition.

    Drew

    Nice way to do it. I've not seen this method before now.

  • Jonathan AC Roberts - Thursday, October 11, 2018 9:07 AM

    drew.allen - Thursday, October 11, 2018 8:26 AM

    ScottPletcher - Wednesday, October 10, 2018 12:56 PM

    drew.allen - Wednesday, October 10, 2018 12:02 PM

    ScottPletcher - Wednesday, October 10, 2018 11:13 AM


    SELECT tn.*
    FROM (
        SELECT NUMBER
        FROM dbo.table_name
        GROUP BY NUMBER
        HAVING COUNT(*) > 1
    ) AS NUMBER_DUPS
    INNER JOIN dbo.table_name tn ON tn.NUMBER = NUMBER_DUPS.NUMBER
    ORDER BY NUMBER, COURSE

    I suspect the following will perform better, but it's hard to test with no sample data.

    WITH Counts AS
    (
     SELECT *, COUNT(*) OVER(PARTITION BY Number) AS cnt
     FROM dbo.table_name
    )
    SELECT *
    FROM Counts
    WHERE cnt > 1

    Drew

    But then you don't list the first row of every NUMBER, right?

    Sorry, I looked back at the OP and realized that you were asking something else.

    Yes, it does include the first row.  A frame is only applied to a windowed function if there is an ORDER BY clause.  Since there was no ORDER BY clause specified in the windowed function, the COUNT is for the entire partition, so the count will be the same for all records in the partition (including the first).  This query will only exclude records where there is exactly one record in the partition.

    Drew

    Nice way to do it. I've not seen this method before now.

    Interesting.  And, true, it will still list all rows. 

    But as to efficiency, doesn't SQL still have to do a separate query to get the COUNT?  Does it actually save any I/O?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply