Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Find duplicate client records and suggest one record to keep Expand / Collapse
Author
Message
Posted Monday, March 25, 2013 5:39 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, May 28, 2014 9:10 AM
Points: 7, Visits: 69
I'm trying to get a list of duplicate records and suggest one to keep.
The first column returns the unique client numbers (custno) and the second column (keepno) should be the Max value of that group.

I'm struggling to get the keepno value.

The following code I've done so far finds a lists the duplicates okay (based on name and zip code) but I don't know how to get that keepno. It's probably a two step procedure?

SELECT a.custno, RTRIM(a.firstname)+' '+RTRIM(a.lastname) AS name, 
company, RTRIM(address1) as address1, add1.zip
FROM cusfil a
JOIN address add1 ON a.custno=add1.custno
WHERE RTRIM(a.firstname)+RTRIM(a.lastname)+RTRIM(add1.zip) IN
(
SELECT RTRIM(b.firstname)+RTRIM(b.lastname)+RTRIM(add2.zip)
FROM cusfil b
JOIN address add2 ON b.custno=add2.custno
WHERE add2.curraddr=1 AND
RTRIM(b.lastname)<>''
GROUP BY b.firstname,b.lastname,add2.zip
HAVING COUNT(b.firstname+b.lastname+add2.zip)>1
)
AND add1.curraddr=1 AND
RTRIM(a.lastname)<>''
ORDER BY a.lastname,a.firstname,add1.zip

example results:

custno keepno name company address1 zip
"8001" "8001" "John Doe" "" "10 Happy St" "1114"
"15" "8001" "John Doe" "" "10 Happy St" "1114"
"7456" "7456" "Fred Finstone" "" "121 Bedrock Pl" "203"
"147" "7456" "Fred Finstone" "" "121 Bedrock Pl" "203"
Post #1434840
Posted Monday, March 25, 2013 5:53 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Wednesday, August 20, 2014 8:05 AM
Points: 1,729, Visits: 491
i've done this at my previous place and found assignment of a scoring algorithm the best method.

you'll need to build the score based on what you think is most important and once a total score is applied to each row you can assign a row number based on the order of the score largest to smallest

Hope this helps
Post #1434844
Posted Monday, March 25, 2013 6:05 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Wednesday, August 20, 2014 8:05 AM
Points: 1,729, Visits: 491
here's an example :

SELECT  a.custno
, RTRIM(a.firstname) + ' ' + RTRIM(a.lastname) AS name
, company
, RTRIM(address1) AS address1
, add1.zip
, ROW_NUMBER() OVER( PARTITION BY a.CustNo ORDER BY

--just a simple sum based on amount of data populated
(CASE WHEN RTRIM(a.firstname) + ' ' + RTRIM(a.lastname) IS NOT NULL THEN 1 ELSE 0 END
+ CASE WHEN company IS NOT NULL THEN 1 ELSE 0 END
+ CASE WHEN RTRIM(address1) IS NOT NULL THEN 1 ELSE 0 END
+ CASE WHEN add1.zip IS NOT NULL THEN 1 ELSE 0 END ) Desc
) AS DupeOrder
FROM cusfil a
JOIN address add1 ON a.custno = add1.custno
WHERE RTRIM(a.firstname) + RTRIM(a.lastname) + RTRIM(add1.zip) IN (
SELECT RTRIM(b.firstname) + RTRIM(b.lastname) + RTRIM(add2.zip)
FROM cusfil b
JOIN address add2 ON b.custno = add2.custno
WHERE add2.curraddr = 1
AND RTRIM(b.lastname) <> ''
GROUP BY b.firstname
, b.lastname
, add2.zip
HAVING COUNT(b.firstname + b.lastname + add2.zip) > 1)
AND add1.curraddr = 1
AND RTRIM(a.lastname) <> ''
ORDER BY a.lastname
, a.firstname
, add1.zip

Post #1434852
Posted Monday, March 25, 2013 10:06 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, May 28, 2014 9:10 AM
Points: 7, Visits: 69
Brilliant!! never thought to use PARTITION BY

Here's what it looks like ...

A big thank you for your time.

SELECT  a.custno
, RTRIM(a.firstname) + ' ' + RTRIM(a.lastname) AS name
, company
, RTRIM(address1) AS address1
, add1.zip
, MAX(a.custno) OVER( PARTITION BY RTRIM(a.firstname) + RTRIM(a.lastname) + RTRIM(add1.zip)
) AS keepno
FROM cusfil a
JOIN address add1 ON a.custno = add1.custno
WHERE RTRIM(a.firstname) + RTRIM(a.lastname) + RTRIM(add1.zip) IN (
SELECT RTRIM(b.firstname) + RTRIM(b.lastname) + RTRIM(add2.zip)
FROM cusfil b
JOIN address add2 ON b.custno = add2.custno
WHERE add2.curraddr = 1
AND RTRIM(b.lastname) <> ''
GROUP BY b.firstname
, b.lastname
, add2.zip
HAVING COUNT(b.firstname + b.lastname + add2.zip) > 1)
AND add1.curraddr = 1
AND RTRIM(a.lastname) <> ''
ORDER BY a.lastname
, a.firstname
, add1.zip

Post #1435000
Posted Monday, March 25, 2013 10:18 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Wednesday, August 20, 2014 8:05 AM
Points: 1,729, Visits: 491
Your very welcome.
Post #1435010
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse