SQL to return only rows that aren't distinct

  • Anybody know an easy way to do this. I have some rows that aren't distinct by the key I'm using, and I want to return just those so I can figure out how I can differentiate between them.

    Thanks.

  • It'll depend on the structure of the table(s) involved.

    Usually, you can identify the rows by some key column(s), by grouping by those and selecting the ones that have a Count(*) that's higher than 1. Then you can join to that to get the full data of each row.

    If you post table definitions, I can get more specific. If you need that.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • by are not distinct you mean duplicates?

    you can use row_number() function or group by...having count(*) > 1

    SELECT * FROM

    (select row_number() over (partition by ColumnList Order By ColumnList) AS RW,

    ColumnList FROM YourTable

    )

    WHERE RW > 1

    --or

    SELECT ColumnList

    From YourTable

    Group By ColumnList

    HAVING count(*) > 1

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thank you guys

  • Instead of doing a where count(*) > 1 it is often faster to check for duplicates by looking for any rows for which you can find at least one other row with the same characteristics. For this to work you first need something to identify each row uniquely by. If your table does not have a unique key (yet), you can add an artificial one by using the row_number() ranking function.

    In practice the trick is to create a common table expression (cte) that assigns each row it's unique number (by use of row_number()), then do a self-join on that cte to find any rows that have the same values but do not have the same row number.

    with cte as (

    select row_number() over (order by (select 1)) as nr,

    t.col1,

    t.col2,

    from dbo.yourtable t

    )

    select t1.*

    from cte t1

    where exists (

    select top (1) *

    from cte t2

    where t2.col1 = t1.col1

    and t2.col2 = t1.col2

    and not t2.nr = t1.nr

    )

    Note: the top (1) is not required for proper functionality. I've just found that this sometimes speeds up the use of exists().

    Note: if your intend is to remove all but one of the duplicate rows, then row_number() is your friend too: Have row_number() generate a new range from 1 up to the number of duplicates by using the "partition by"-clause on it and then simply delete from the cte all rows that have a row_number() higher than 1.



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply