Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««23456

Complicated Comments Expand / Collapse
Author
Message
Posted Friday, February 13, 2009 4:43 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, February 18, 2014 7:14 AM
Points: 1,344, Visits: 1,983
Regarding your problem of stored procedure comparison, Windiff is still available (just Google "windiff download", one of the places is Microsoft) so I'd have just used that (and regularly do!).

Of course,if you start writing your own tool, so you can ignore comments, then you are starting on a complex path since, ideally, you need to build a complete parser to check that
select a,b,c from tbl

is equivalent to
SELECT
d.a ,
d.b ,
d.c
FROM
tbl AS d

or even that
select a.a, b.b from a join b on a.c=b.c

is the same as
select a.a, b.b from b join a on b.c=a.c

And that's before comments have been introduced!

Sometimes the solution is just to use a simpler tool like Windiff along with the mk.1 Eyeball as used by an experience SQL writer. :D


Derek
Post #656416
Posted Friday, February 13, 2009 9:34 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Thursday, September 4, 2014 11:21 AM
Points: 305, Visits: 566
Derek Dongray (2/13/2009)
Regarding your problem of stored procedure comparison, Windiff is still available (just Google "windiff download", one of the places is Microsoft) so I'd have just used that (and regularly do!).
...
...
Sometimes the solution is just to use a simpler tool like Windiff along with the mk.1 Eyeball as used by an experience SQL writer. :D


I agree with your admonition against reinventing the wheel, as Windiff does a great job. But my original intention was to provide quantitative matching results, not necessarily the visual representation of the matching. I imagined this as being part of a larger tool to compare databases (for example, between development, testing, and production versions of a database, or instances of independently maintained databases that began as identical copies), which would involve automatic comparisons of many objects.

The more subtle differences you mentioned above make the problem even more interesting, and in my comparison of the two spds of 1000+ lines each, I found many instances of such. You're right, the complexity goes upwards from there. This sort of "fuzzy matching" is exactly what's involved in name/address deduplication in direct marketing. For example, how do we match the two addresses:

Jon G. Smith
100 E. Main St
Baltimore, MD

J. Smyth
100 East Main Street
Balt. Maryland

These sorts of problems have been tackled (in fact, I've written parsing, scoring, and matching algorithms myself to handle it). I think the SQL parser would be of similar complexity. Using the current method of capturing lines of code from the spds, we would have to parse those lines into "words" based on white space, etc. Taking the example of the SELECT you made, we would have to standardize the JOIN sequence and data nicknames, compare the metadata of the results, etc. I'm not sure I would want to go in that direction, but it's possible.


Jesse McLain
jesse@jessemclain.com
www.jessemclain.com
www.jessesql.blogspot.com
Post #656754
Posted Sunday, February 15, 2009 7:34 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Wednesday, September 10, 2014 3:19 AM
Points: 5,363, Visits: 1,391
this question really got me. Amazing question. Really good one....:)


Post #657325
Posted Monday, February 16, 2009 2:55 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, February 18, 2014 7:14 AM
Points: 1,344, Visits: 1,983
Jesse McLain (2/13/2009)I agree with your admonition against reinventing the wheel, as Windiff does a great job. But my original intention was to provide quantitative matching results, not necessarily the visual representation of the matching.
I've found in the past that sometimes it's necessary to reinvent the wheel to add an extra bit of functionality that a tool doesn't provide. It depends how much cost is involved.
This sort of "fuzzy matching" is exactly what's involved in name/address deduplication in direct marketing.
I too have had to write routines to match and format addresses when loading data from legacy business systems into newer replacements. Since, usually, these are only going to be used over a short period, I've found that the last step often comes down to a manual/visual check. Obviously, when it's finance data, finding duplicates is much more important thatn when it's a mailshot!
Using the current method of capturing lines of code from the spds, we would have to parse those lines into "words" based on white space, etc. Taking the example of the SELECT you made, we would have to standardize the JOIN sequence and data nicknames, compare the metadata of the results, etc. I'm not sure I would want to go in that direction, but it's possible.
Actually, with tools like Flex and Bison it might not be too hard to build something.The metadata comparison might get quite complicated (consider re-ordered equijoins) and, of course, may still fail to identify functionally equivalent code.


Derek
Post #657543
« Prev Topic | Next Topic »

Add to briefcase «««23456

Permissions Expand / Collapse