• Ok... So this is where the real magic happens...

    -- Use nested sets to create the groups

    EXEC dbo.CreateNestedSets-- This proc is simply a dynamic version of Jeff Modens "Hierarchies on Steroids #1: Convert an Adjacency List to Nested Sets" script.

    @TableName = '#BestMatches',

    @ChildColName = 'ChildID',

    @ParentColName = 'ParentID',

    @LeftColName = 'Lft',

    @RightColName = 'Rgt',

    @DeBug = 0;

    DELETE #BestMatches WHERE ParentID IS NULL;-- We no longer need the master parent, so it gets deleted.

    There's lot's of good information about how nested sets work so I won't get into it here but I would encourage you to (at the very least) read Jeff's article Hierarchies on Steroids #1: Convert an Adjacency List to Nested Sets[/url], as it provided the basis for the dbo.CreateNestedSets proc.

    And... Love him or hate him... Joe Celko has some awesome articles on the subject...

    Now that the Lft & Rgt values have been set, and the master parent deleted, the #BestMatches table should look like this... (ORDER BY Lft)

    ParentID ChildID BestScore Lft Rgt

    ----------- ----------- ----------- ----------- -----------

    0 1 100 2 9

    1 2 13 3 8

    2 3 11 4 7

    3 4 13 5 6

    0 5 100 10 17

    5 6 9 11 12

    5 7 12 13 14

    5 8 13 15 16

    0 9 100 18 25

    9 10 10 19 20

    9 11 12 21 22

    9 12 14 23 24

    0 13 100 26 29

    13 14 14 27 28

    0 15 100 30 33

    15 16 12 31 32

    0 17 100 34 41

    17 18 13 35 38

    18 19 12 36 37

    17 20 9 39 40

    0 21 100 42 49

    21 22 13 43 44

    21 23 8 45 46

    21 24 15 47 48

    If you're not at all familiar with nested set hierarchies or how they work, take a few minutes to read the following Wikipedia article https://en.wikipedia.org/wiki/Nested_set_model, so that you have a basic understanding of how it works.

    ---------------------------------------------------

    The last & final step is to use the data from #BestMatches to update the #People table...

    UPDATE p SET

    p.GroupID = bm.GroupID,

    p.MatchQuality = bm.BestScore

    FROM

    #People p

    JOIN (

    SELECT

    Groups.ChildID AS GroupID,

    bm2.ChildID,

    bm2.BestScore

    FROM (-- Derived table that pulls out just the "Parent" rows. Their PersonID will become the groups GroupID.

    SELECT

    bm1.ChildID,

    bm1.Lft,

    bm1.Rgt

    FROM

    #BestMatches bm1

    WHERE

    bm1.ParentID = 0

    ) Groups

    JOIN #BestMatches bm2-- This is where the "Child" rows are brought into scope...

    ON bm2.Lft BETWEEN Groups.Lft AND Groups.Rgt-- This where the power of nested sets really shines! Any row that has a Lft value (or right, both work) that's between

    -- the Grpous.Lft and Groups.Rgt is a member of that group. It's as simple as that.

    ) bm

    ON p.PersonID = bm.ChildID;

    Now that #People has been updated, lets have a look...

    PersonID FirstName LastName SSN Address_1 Address_2 City State Zip GroupID MatchQuality

    ----------- -------------------- -------------------- --------- ------------------------------ ------------------------------ ------------------------------ ----- ----- ----------- ------------

    1 Bob Jones 333224444 258 Grants Trail #3 Bowling Green KY 45459 1 100

    2 Robert Jones 333224444 258 Grants Trail Ste 3 Bowling Green KY 45459 1 13

    3 Rob Jones 333224444 258 Grants Tr. NULL Bowling Green KY 45459 1 11

    4 R. Jones 333224444 258 Grants Tr. Ste 3 Bowling Green KY 45459 1 13

    5 Louis Armstrong 258321478 123 Humming Bird Ln. NULL Jacksonville FL 32209 5 100

    6 L. Armstrng 258321478 123 Humming Bird Lane NULL Jacksonville FL 32207 5 9

    7 Louie Armstrong 258321478 123 Humming Bird Lain NULL Jacksonville FL 32209 5 12

    8 Louis Armstong 258321478 123 Humming Bird Ln. NULL Jacksonville FL 32209 5 13

    9 Mark Adams 321456987 555 Baymeadows Dr. Unit 42 Ashville NC 33304 9 100

    10 M. Adams 321456987 555 Bay Meadows Dr. # 42 Ashvlle NC 33304 9 10

    11 Mark Adams 321456987 555 Baymeadows Drive Unit 42 Ashville NC 33305 9 12

    12 Mark Adams 321456987 555 Baymeadows Dr. Unit 42 Ashville NC 33306 9 14

    13 Bob Jones 555229999 4227 Some Place Ln. #3 Bowling Green KY 45459 13 100

    14 Bob Jones 555229999 4227 Some Place Ln. # 3 Bowling Green KY 45459 13 14

    15 Bob Jones 555292999 4227 SomePlace Ln Unit 3 Bowling Green KY 45459 15 100

    16 Bob Jones 555292999 4227 Some Place Ln. Ste 3 Bowling Green KY 45459 15 12

    17 Louis Armstrong 147852369 123 Baldwin Ave. NULL Jacksonville FL 32209 17 100

    18 Louis Armstrong 147852369 123 Baldwin Ave NULL Jacksonville FL 32209 17 13

    19 Louis Armstrong 147852369 123 Baldwin Anenue NULL Jachsonville FL 32209 17 12

    20 Louis Armstrong 147852639 123 Baldwin Ave. NULL Jacksonville FL 32209 17 9

    21 Mark Adams 654789321 524 Main St. Unit 42 Bakersville NC 33304 21 100

    22 Mark Adam 654789321 524 Main St. Unit 42 Bakersville NC 33304 21 13

    23 Mark Adams 654789231 524 Main St. Unit 42 Barkersville NC 33304 21 8

    24 Mark Adams 654789321 524 Main St. Unit 42 Bakersville NC 33304 21 15

    Scroll all the way over to the right to see how everything was matched up and the quality of the match.

    Also note... That row 15 split away (taking row 16 with it) for form a new group rather than joining rows 13 & 14.

    To the human eye, it's apparent that 13, 14, 15 & 16 are the same person, but the transposition in the SSN and other inconsistencies were enough to make SQL Server "think" they are two separate individuals.

    I did that intentionally to illustrate the necessity of analyzing your data and setting score values and the threshold accordingly.

    Anyway, I hold this helps.

    If you're still having problems let me know. If Erika hasn't blown my house down, I'll try to help.