Finding duplicates based on count on different fields

  • Hello everyone

    0

    I have a Contact table which has columns like ContactID, Nationalidnumber, Firstname, Birthdate, Mobilephone, Emailaddress, CreatedOn and so on. I want to find duplicate contactIDs where two or more rows has:

    same Nationalidnumber AND

    same (Birthdate + Mobilephone) combination AND

    same (Birthdate + Emailaddress) combination AND

    same (Firstname + Mobilephone) combination AND

    same (Firstname + Emailaddress) combination Can someone help me with how can I solve this query?

    Regards PP

     

  • In conclusion I would recommend a crosstab-type query.  Conventions of language don't always map to relational operators.  And often predictably so.  Language conventions that is.    Are you sure you're intending AND in all the places?  Is it 1 big condition or 4 separate compound conditions?  My guess is 4

    (same Nationalidnumber AND same (Birthdate + Mobilephone) combination) OR
    (same Nationalidnumber AND same (Birthdate + Emailaddress) combination) OR
    (same Nationalidnumber AND same (Firstname + Mobilephone) combination) OR
    (same Nationalidnumber AND same (Firstname + Emailaddress))

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Or it could be 5 or ?

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Not a lot of details.  Maybe something like this:

    SELECT 
    NationalNumber
    FROM Contact
    GROUP BY NationalNumber
    HAVING /*COUNT(*) > 1 AND*/
    ((COUNT(DISTINCT Birthdate) > 1 AND COUNT(DISTINCT Mobilephone) > 1) OR
    (COUNT(DISTINCT Birthdate) > 1 AND COUNT(DISTINCT Emailaddress) > 1) OR
    (COUNT(DISTINCT Firstname) > 1 AND COUNT(DISTINCT Mobilephone) > 1) OR
    (COUNT(DISTINCT Firstname) > 1 AND COUNT(DISTINCT Emailaddress) > 1))
    ORDER BY NationalNumber

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • @scottpletcher: the data types for firstname and emailaddress fields are text. Will count work on them?

  • @steve-2 Collins: Actually I need to find rows which satisfy all the conditions mentioned that if any two records from contact have same nationalidnumber, same birthdate, same mobilephone, same first name and same emailaddress then it is a duplicate and should be shown in the report. So I thought the operator between them should be AND and not OR? or am I wrong?

  • ;WITH cte_count_dups AS (
    SELECT
    *, COUNT(*) OVER(PARTITION BY Nationalnumber, Birthdate, Mobilphone, Emailaddress, Firstname) AS dup_count
    FROM Contact
    )
    SELECT *
    FROM cte_count_dups
    WHERE dup_count > 1

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • You'll need to CAST any text column(s) to varchar(max) in the PARTITION BY.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply