Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Find duplicate client records and suggest one record to keep


Find duplicate client records and suggest one record to keep

Author
Message
peterlim05
peterlim05
Grasshopper
Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)

Group: General Forum Members
Points: 10 Visits: 202
I'm trying to get a list of duplicate records and suggest one to keep.
The first column returns the unique client numbers (custno) and the second column (keepno) should be the Max value of that group.

I'm struggling to get the keepno value.

The following code I've done so far finds a lists the duplicates okay (based on name and zip code) but I don't know how to get that keepno. It's probably a two step procedure?


SELECT a.custno, RTRIM(a.firstname)+' '+RTRIM(a.lastname) AS name,
company, RTRIM(address1) as address1, add1.zip
FROM cusfil a
JOIN address add1 ON a.custno=add1.custno
WHERE RTRIM(a.firstname)+RTRIM(a.lastname)+RTRIM(add1.zip) IN
(
SELECT RTRIM(b.firstname)+RTRIM(b.lastname)+RTRIM(add2.zip)
FROM cusfil b
JOIN address add2 ON b.custno=add2.custno
WHERE add2.curraddr=1 AND
RTRIM(b.lastname)<>''
GROUP BY b.firstname,b.lastname,add2.zip
HAVING COUNT(b.firstname+b.lastname+add2.zip)>1
)
AND add1.curraddr=1 AND
RTRIM(a.lastname)<>''
ORDER BY a.lastname,a.firstname,add1.zip



example results:

custno keepno name company address1 zip
"8001" "8001" "John Doe" "" "10 Happy St" "1114"
"15" "8001" "John Doe" "" "10 Happy St" "1114"
"7456" "7456" "Fred Finstone" "" "121 Bedrock Pl" "203"
"147" "7456" "Fred Finstone" "" "121 Bedrock Pl" "203"
Terry300577
Terry300577
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1839 Visits: 508
i've done this at my previous place and found assignment of a scoring algorithm the best method.

you'll need to build the score based on what you think is most important and once a total score is applied to each row you can assign a row number based on the order of the score largest to smallest

Hope this helps
Terry300577
Terry300577
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1839 Visits: 508
here's an example :

SELECT  a.custno
, RTRIM(a.firstname) + ' ' + RTRIM(a.lastname) AS name
, company
, RTRIM(address1) AS address1
, add1.zip
, ROW_NUMBER() OVER( PARTITION BY a.CustNo ORDER BY

--just a simple sum based on amount of data populated
(CASE WHEN RTRIM(a.firstname) + ' ' + RTRIM(a.lastname) IS NOT NULL THEN 1 ELSE 0 END
+ CASE WHEN company IS NOT NULL THEN 1 ELSE 0 END
+ CASE WHEN RTRIM(address1) IS NOT NULL THEN 1 ELSE 0 END
+ CASE WHEN add1.zip IS NOT NULL THEN 1 ELSE 0 END ) Desc
) AS DupeOrder
FROM cusfil a
JOIN address add1 ON a.custno = add1.custno
WHERE RTRIM(a.firstname) + RTRIM(a.lastname) + RTRIM(add1.zip) IN (
SELECT RTRIM(b.firstname) + RTRIM(b.lastname) + RTRIM(add2.zip)
FROM cusfil b
JOIN address add2 ON b.custno = add2.custno
WHERE add2.curraddr = 1
AND RTRIM(b.lastname) <> ''
GROUP BY b.firstname
, b.lastname
, add2.zip
HAVING COUNT(b.firstname + b.lastname + add2.zip) > 1)
AND add1.curraddr = 1
AND RTRIM(a.lastname) <> ''
ORDER BY a.lastname
, a.firstname
, add1.zip


peterlim05
peterlim05
Grasshopper
Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)Grasshopper (10 reputation)

Group: General Forum Members
Points: 10 Visits: 202
Brilliant!! never thought to use PARTITION BY :-)

Here's what it looks like ...

A big thank you for your time.

SELECT  a.custno
, RTRIM(a.firstname) + ' ' + RTRIM(a.lastname) AS name
, company
, RTRIM(address1) AS address1
, add1.zip
, MAX(a.custno) OVER( PARTITION BY RTRIM(a.firstname) + RTRIM(a.lastname) + RTRIM(add1.zip)
) AS keepno
FROM cusfil a
JOIN address add1 ON a.custno = add1.custno
WHERE RTRIM(a.firstname) + RTRIM(a.lastname) + RTRIM(add1.zip) IN (
SELECT RTRIM(b.firstname) + RTRIM(b.lastname) + RTRIM(add2.zip)
FROM cusfil b
JOIN address add2 ON b.custno = add2.custno
WHERE add2.curraddr = 1
AND RTRIM(b.lastname) <> ''
GROUP BY b.firstname
, b.lastname
, add2.zip
HAVING COUNT(b.firstname + b.lastname + add2.zip) > 1)
AND add1.curraddr = 1
AND RTRIM(a.lastname) <> ''
ORDER BY a.lastname
, a.firstname
, add1.zip


Terry300577
Terry300577
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1839 Visits: 508
Your very welcome.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search