• Jesse McLain (2/13/2009)I agree with your admonition against reinventing the wheel, as Windiff does a great job. But my original intention was to provide quantitative matching results, not necessarily the visual representation of the matching.

    I've found in the past that sometimes it's necessary to reinvent the wheel to add an extra bit of functionality that a tool doesn't provide. It depends how much cost is involved.

    This sort of "fuzzy matching" is exactly what's involved in name/address deduplication in direct marketing.

    I too have had to write routines to match and format addresses when loading data from legacy business systems into newer replacements. Since, usually, these are only going to be used over a short period, I've found that the last step often comes down to a manual/visual check. Obviously, when it's finance data, finding duplicates is much more important thatn when it's a mailshot!

    Using the current method of capturing lines of code from the spds, we would have to parse those lines into "words" based on white space, etc. Taking the example of the SELECT you made, we would have to standardize the JOIN sequence and data nicknames, compare the metadata of the results, etc. I'm not sure I would want to go in that direction, but it's possible.

    Actually, with tools like Flex and Bison it might not be too hard to build something.The metadata comparison might get quite complicated (consider re-ordered equijoins) and, of course, may still fail to identify functionally equivalent code.

    Derek