• I wanted to add that one of the big strengths of this algorithm is that it will detect similarities even if the words are scrambled in different orders or if one string has an additional word in the beginning. Because it is only looking at 2 character sequences, it doesn't care about positioning within the string. This is very helpful when dealing with names, which can often be out of order, such as when the data entry person switches the first and last name, or they put a double last name (common among Hispanics) in the middle name, then the last name, and then the next person does it the opposite way.

    I have seen so many bizarre data entry issues, where the same person was entered into the database quite differently each time, but my algorithm was able to find them.