Find duplicate client records and suggest one record to keep

  • I'm trying to get a list of duplicate records and suggest one to keep.

    The first column returns the unique client numbers (custno) and the second column (keepno) should be the Max value of that group.

    I'm struggling to get the keepno value.

    The following code I've done so far finds a lists the duplicates okay (based on name and zip code) but I don't know how to get that keepno. It's probably a two step procedure?

    SELECT a.custno, RTRIM(a.firstname)+' '+RTRIM(a.lastname) AS name,

    company, RTRIM(address1) as address1, add1.zip

    FROM cusfil a

    JOIN address add1 ON a.custno=add1.custno

    WHERE RTRIM(a.firstname)+RTRIM(a.lastname)+RTRIM(add1.zip) IN

    (

    SELECT RTRIM(b.firstname)+RTRIM(b.lastname)+RTRIM(add2.zip)

    FROM cusfil b

    JOIN address add2 ON b.custno=add2.custno

    WHERE add2.curraddr=1 AND

    RTRIM(b.lastname)<>''

    GROUP BY b.firstname,b.lastname,add2.zip

    HAVING COUNT(b.firstname+b.lastname+add2.zip)>1

    )

    AND add1.curraddr=1 AND

    RTRIM(a.lastname)<>''

    ORDER BY a.lastname,a.firstname,add1.zip

    example results:

    custnokeepnonamecompanyaddress1zip

    "8001""8001""John Doe""""10 Happy St""1114"

    "15""8001""John Doe""""10 Happy St""1114"

    "7456""7456""Fred Finstone""""121 Bedrock Pl""203"

    "147""7456""Fred Finstone""""121 Bedrock Pl""203"

  • i've done this at my previous place and found assignment of a scoring algorithm the best method.

    you'll need to build the score based on what you think is most important and once a total score is applied to each row you can assign a row number based on the order of the score largest to smallest

    Hope this helps

  • here's an example :

    SELECT a.custno

    , RTRIM(a.firstname) + ' ' + RTRIM(a.lastname) AS name

    , company

    , RTRIM(address1) AS address1

    , add1.zip

    , ROW_NUMBER() OVER( PARTITION BY a.CustNo ORDER BY

    --just a simple sum based on amount of data populated

    (CASE WHEN RTRIM(a.firstname) + ' ' + RTRIM(a.lastname) IS NOT NULL THEN 1 ELSE 0 END

    + CASE WHEN company IS NOT NULL THEN 1 ELSE 0 END

    + CASE WHEN RTRIM(address1) IS NOT NULL THEN 1 ELSE 0 END

    + CASE WHEN add1.zip IS NOT NULL THEN 1 ELSE 0 END ) Desc

    ) AS DupeOrder

    FROM cusfil a

    JOIN address add1 ON a.custno = add1.custno

    WHERE RTRIM(a.firstname) + RTRIM(a.lastname) + RTRIM(add1.zip) IN (

    SELECT RTRIM(b.firstname) + RTRIM(b.lastname) + RTRIM(add2.zip)

    FROM cusfil b

    JOIN address add2 ON b.custno = add2.custno

    WHERE add2.curraddr = 1

    AND RTRIM(b.lastname) <> ''

    GROUP BY b.firstname

    , b.lastname

    , add2.zip

    HAVING COUNT(b.firstname + b.lastname + add2.zip) > 1)

    AND add1.curraddr = 1

    AND RTRIM(a.lastname) <> ''

    ORDER BY a.lastname

    , a.firstname

    , add1.zip

  • Brilliant!! never thought to use PARTITION BY 🙂

    Here's what it looks like ...

    A big thank you for your time.

    SELECT a.custno

    , RTRIM(a.firstname) + ' ' + RTRIM(a.lastname) AS name

    , company

    , RTRIM(address1) AS address1

    , add1.zip

    , MAX(a.custno) OVER( PARTITION BY RTRIM(a.firstname) + RTRIM(a.lastname) + RTRIM(add1.zip)

    ) AS keepno

    FROM cusfil a

    JOIN address add1 ON a.custno = add1.custno

    WHERE RTRIM(a.firstname) + RTRIM(a.lastname) + RTRIM(add1.zip) IN (

    SELECT RTRIM(b.firstname) + RTRIM(b.lastname) + RTRIM(add2.zip)

    FROM cusfil b

    JOIN address add2 ON b.custno = add2.custno

    WHERE add2.curraddr = 1

    AND RTRIM(b.lastname) <> ''

    GROUP BY b.firstname

    , b.lastname

    , add2.zip

    HAVING COUNT(b.firstname + b.lastname + add2.zip) > 1)

    AND add1.curraddr = 1

    AND RTRIM(a.lastname) <> ''

    ORDER BY a.lastname

    , a.firstname

    , add1.zip

  • Your very welcome.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply