|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Monday, May 13, 2013 2:04 AM
Points: 1,342,
Visits: 1,946
|
|
Regarding your problem of stored procedure comparison, Windiff is still available (just Google "windiff download", one of the places is Microsoft) so I'd have just used that (and regularly do!).
Of course,if you start writing your own tool, so you can ignore comments, then you are starting on a complex path since, ideally, you need to build a complete parser to check thatselect a,b,c from tbl is equivalent toSELECT d.a , d.b , d.c FROM tbl AS d or even thatselect a.a, b.b from a join b on a.c=b.c is the same asselect a.a, b.b from b join a on b.c=a.c And that's before comments have been introduced!
Sometimes the solution is just to use a simpler tool like Windiff along with the mk.1 Eyeball as used by an experience SQL writer. :D
Derek
|
|
|
|
|
Old Hand
      
Group: General Forum Members
Last Login: Monday, May 06, 2013 8:47 AM
Points: 304,
Visits: 552
|
|
Derek Dongray (2/13/2009)
Regarding your problem of stored procedure comparison, Windiff is still available (just Google "windiff download", one of the places is Microsoft) so I'd have just used that (and regularly do!). ... ... Sometimes the solution is just to use a simpler tool like Windiff along with the mk.1 Eyeball as used by an experience SQL writer. :D
I agree with your admonition against reinventing the wheel, as Windiff does a great job. But my original intention was to provide quantitative matching results, not necessarily the visual representation of the matching. I imagined this as being part of a larger tool to compare databases (for example, between development, testing, and production versions of a database, or instances of independently maintained databases that began as identical copies), which would involve automatic comparisons of many objects.
The more subtle differences you mentioned above make the problem even more interesting, and in my comparison of the two spds of 1000+ lines each, I found many instances of such. You're right, the complexity goes upwards from there. This sort of "fuzzy matching" is exactly what's involved in name/address deduplication in direct marketing. For example, how do we match the two addresses:
Jon G. Smith 100 E. Main St Baltimore, MD
J. Smyth 100 East Main Street Balt. Maryland
These sorts of problems have been tackled (in fact, I've written parsing, scoring, and matching algorithms myself to handle it). I think the SQL parser would be of similar complexity. Using the current method of capturing lines of code from the spds, we would have to parse those lines into "words" based on white space, etc. Taking the example of the SELECT you made, we would have to standardize the JOIN sequence and data nicknames, compare the metadata of the results, etc. I'm not sure I would want to go in that direction, but it's possible.
Jesse McLain jesse@jessemclain.com www.jessemclain.com www.jessesql.blogspot.com
|
|
|
|
|
SSCarpal Tunnel
       
Group: General Forum Members
Last Login: Tuesday, June 11, 2013 5:01 AM
Points: 4,816,
Visits: 1,343
|
|
this question really got me. Amazing question. Really good one....:)
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Monday, May 13, 2013 2:04 AM
Points: 1,342,
Visits: 1,946
|
|
Jesse McLain (2/13/2009)I agree with your admonition against reinventing the wheel, as Windiff does a great job. But my original intention was to provide quantitative matching results, not necessarily the visual representation of the matching. I've found in the past that sometimes it's necessary to reinvent the wheel to add an extra bit of functionality that a tool doesn't provide. It depends how much cost is involved.This sort of "fuzzy matching" is exactly what's involved in name/address deduplication in direct marketing. I too have had to write routines to match and format addresses when loading data from legacy business systems into newer replacements. Since, usually, these are only going to be used over a short period, I've found that the last step often comes down to a manual/visual check. Obviously, when it's finance data, finding duplicates is much more important thatn when it's a mailshot!Using the current method of capturing lines of code from the spds, we would have to parse those lines into "words" based on white space, etc. Taking the example of the SELECT you made, we would have to standardize the JOIN sequence and data nicknames, compare the metadata of the results, etc. I'm not sure I would want to go in that direction, but it's possible. Actually, with tools like Flex and Bison it might not be too hard to build something.The metadata comparison might get quite complicated (consider re-ordered equijoins) and, of course, may still fail to identify functionally equivalent code.
Derek
|
|
|
|