Ok... So this is where the real magic happens...
-- Use nested sets to create the groups
EXEC dbo.CreateNestedSets-- This proc is simply a dynamic version of Jeff Modens "Hierarchies on Steroids #1: Convert an Adjacency List to Nested Sets" script.
@TableName = '#BestMatches',
@ChildColName = 'ChildID',
@ParentColName = 'ParentID',
@LeftColName = 'Lft',
@RightColName = 'Rgt',
@DeBug = 0;
DELETE #BestMatches WHERE ParentID IS NULL;-- We no longer need the master parent, so it gets deleted.
There's lot's of good information about how nested sets work so I won't get into it here but I would encourage you to (at the very least) read Jeff's article Hierarchies on Steroids #1: Convert an Adjacency List to Nested Sets[/url], as it provided the basis for the dbo.CreateNestedSets proc.
And... Love him or hate him... Joe Celko has some awesome articles on the subject...
Now that the Lft & Rgt values have been set, and the master parent deleted, the #BestMatches table should look like this... (ORDER BY Lft)
ParentID ChildID BestScore Lft Rgt
----------- ----------- ----------- ----------- -----------
0 1 100 2 9
1 2 13 3 8
2 3 11 4 7
3 4 13 5 6
0 5 100 10 17
5 6 9 11 12
5 7 12 13 14
5 8 13 15 16
0 9 100 18 25
9 10 10 19 20
9 11 12 21 22
9 12 14 23 24
0 13 100 26 29
13 14 14 27 28
0 15 100 30 33
15 16 12 31 32
0 17 100 34 41
17 18 13 35 38
18 19 12 36 37
17 20 9 39 40
0 21 100 42 49
21 22 13 43 44
21 23 8 45 46
21 24 15 47 48
If you're not at all familiar with nested set hierarchies or how they work, take a few minutes to read the following Wikipedia article https://en.wikipedia.org/wiki/Nested_set_model, so that you have a basic understanding of how it works.
---------------------------------------------------
The last & final step is to use the data from #BestMatches to update the #People table...
UPDATE p SET
p.GroupID = bm.GroupID,
p.MatchQuality = bm.BestScore
FROM
#People p
JOIN (
SELECT
Groups.ChildID AS GroupID,
bm2.ChildID,
bm2.BestScore
FROM (-- Derived table that pulls out just the "Parent" rows. Their PersonID will become the groups GroupID.
SELECT
bm1.ChildID,
bm1.Lft,
bm1.Rgt
FROM
#BestMatches bm1
WHERE
bm1.ParentID = 0
) Groups
JOIN #BestMatches bm2-- This is where the "Child" rows are brought into scope...
ON bm2.Lft BETWEEN Groups.Lft AND Groups.Rgt-- This where the power of nested sets really shines! Any row that has a Lft value (or right, both work) that's between
-- the Grpous.Lft and Groups.Rgt is a member of that group. It's as simple as that.
) bm
ON p.PersonID = bm.ChildID;
Now that #People has been updated, lets have a look...
PersonID FirstName LastName SSN Address_1 Address_2 City State Zip GroupID MatchQuality
----------- -------------------- -------------------- --------- ------------------------------ ------------------------------ ------------------------------ ----- ----- ----------- ------------
1 Bob Jones 333224444 258 Grants Trail #3 Bowling Green KY 45459 1 100
2 Robert Jones 333224444 258 Grants Trail Ste 3 Bowling Green KY 45459 1 13
3 Rob Jones 333224444 258 Grants Tr. NULL Bowling Green KY 45459 1 11
4 R. Jones 333224444 258 Grants Tr. Ste 3 Bowling Green KY 45459 1 13
5 Louis Armstrong 258321478 123 Humming Bird Ln. NULL Jacksonville FL 32209 5 100
6 L. Armstrng 258321478 123 Humming Bird Lane NULL Jacksonville FL 32207 5 9
7 Louie Armstrong 258321478 123 Humming Bird Lain NULL Jacksonville FL 32209 5 12
8 Louis Armstong 258321478 123 Humming Bird Ln. NULL Jacksonville FL 32209 5 13
9 Mark Adams 321456987 555 Baymeadows Dr. Unit 42 Ashville NC 33304 9 100
10 M. Adams 321456987 555 Bay Meadows Dr. # 42 Ashvlle NC 33304 9 10
11 Mark Adams 321456987 555 Baymeadows Drive Unit 42 Ashville NC 33305 9 12
12 Mark Adams 321456987 555 Baymeadows Dr. Unit 42 Ashville NC 33306 9 14
13 Bob Jones 555229999 4227 Some Place Ln. #3 Bowling Green KY 45459 13 100
14 Bob Jones 555229999 4227 Some Place Ln. # 3 Bowling Green KY 45459 13 14
15 Bob Jones 555292999 4227 SomePlace Ln Unit 3 Bowling Green KY 45459 15 100
16 Bob Jones 555292999 4227 Some Place Ln. Ste 3 Bowling Green KY 45459 15 12
17 Louis Armstrong 147852369 123 Baldwin Ave. NULL Jacksonville FL 32209 17 100
18 Louis Armstrong 147852369 123 Baldwin Ave NULL Jacksonville FL 32209 17 13
19 Louis Armstrong 147852369 123 Baldwin Anenue NULL Jachsonville FL 32209 17 12
20 Louis Armstrong 147852639 123 Baldwin Ave. NULL Jacksonville FL 32209 17 9
21 Mark Adams 654789321 524 Main St. Unit 42 Bakersville NC 33304 21 100
22 Mark Adam 654789321 524 Main St. Unit 42 Bakersville NC 33304 21 13
23 Mark Adams 654789231 524 Main St. Unit 42 Barkersville NC 33304 21 8
24 Mark Adams 654789321 524 Main St. Unit 42 Bakersville NC 33304 21 15
Scroll all the way over to the right to see how everything was matched up and the quality of the match.
Also note... That row 15 split away (taking row 16 with it) for form a new group rather than joining rows 13 & 14.
To the human eye, it's apparent that 13, 14, 15 & 16 are the same person, but the transposition in the SSN and other inconsistencies were enough to make SQL Server "think" they are two separate individuals.
I did that intentionally to illustrate the necessity of analyzing your data and setting score values and the threshold accordingly.
Anyway, I hold this helps.
If you're still having problems let me know. If Erika hasn't blown my house down, I'll try to help.