• craig.bobchin (9/1/2015)


    Jason,

    That works great. I've been fooling around with the thresholds to try and get the best match before I present it to the business. I also made a couple of other minot tweaks to handle the 'n/a' that can appear in any of the matching fields.

    Takes about 12 minutes to run against our full table of 30k records.

    One thing I got to thinking about is as we add more records to the dimMember table, and have to run this script against the table daily, member's groupId has the potential to change based on new/updated member data.

    I'm curious as to your take on this aspect. I'm beginning to feel it's turtles all the way down.

    I'm strongly advocating that the company get MDS in house so we can avoid this issue.

    Glad you got it working!

    You concerns about scalability are well founded. Any time you're comparing every row in a table to every other row in the table it's not going scale very well. The use of a triangular join is a lot better than a full Cartesian join but fair bit of the pain is still there.

    If you're going to be doing this on a regular basis, I'd modify the script so that only new rows are being scored. There's no good reason to rescore a row that's currently a member of a good group. Assuming you have a fairly consistent number of new members added each day, scoring only new members should allow the solution to scale MUCH better than rescoring every row every day.

    I don't know what "MDS" is, unless you're talking about Melissa Data... I've heard good things about it but I have no personal experience with it. If you go that rout, please post your impressions.