Tool to search, list out distinct .sql files

Question

Post reply

Tool to search, list out distinct .sql files

curious_sqldba

SSC-Dedicated

Points: 36494
More actions
August 6, 2013 at 12:31 pm

#280883

Is there any tool which can scan through each script and spit out list of scripts which are same?

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Lowell SSC Guru Points: 323504 More actions · Answer 1

WinMerge, which is free, can compare folders vs folders or files vs files, and identifiy differences:

http://winmerge.org/

will that do what you want?

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

curious_sqldba SSC-Dedicated Points: 36494 More actions · Answer 2

Lowell (8/6/2013)
WinMerge, which is free, can compare folders vs folders or files vs files, and identifiy differences:
http://winmerge.org/
will that do what you want?

Actually i do use winmerge, basically i have 10 scripts, out of these there is possibility that many scripts might have same code. I just need to know which one's are same, instead of me doing a 1 to many comparison for each script.

Jeff Moden SSC Guru Points: 1003872 More actions · Answer 3

curious_sqldba (8/6/2013)
Lowell (8/6/2013)
WinMerge, which is free, can compare folders vs folders or files vs files, and identifiy differences:
http://winmerge.org/
will that do what you want?
Actually i do use winmerge, basically i have 10 scripts, out of these there is possibility that many scripts might have same code. I just need to know which one's are same, instead of me doing a 1 to many comparison for each script.

Although there are better tools for this, it is possible to check for duplication and even produce a count of which scripts are "identical" using T-SQL as a "hammer".

The first question would be, what do you mean "might have same code"? If you mean they are absolutely identical including any white space, control characters (Cr, Lf, Tab, etc), and casing, then you could load the scripts into a table, use HASHBYTES to produce a hashcode for each, and then compare the hashcodes. If you're using a case-insensitive collation, then casing won't matter.

If the code has differences in white space or control characters, then you'll need to replace the control characters with spaces and de-duplicate the spaces prior to the HASHBYTES conversion.

Also keep in mind that the above process does not strip out comments. If the comments differ, so will the HASHBYTES conversion. To remove the comments would be very much more difficult to do in T-SQL. It could be done but then you'd have to evaluate how long it would take to develop such a thing compared to doing the task manually. Even such a simple difference as someone including a single semi-colon where the others have not will cause a difference in the HASHBYTES. Of course, you could treat semi-colons as if they were control characters in this case.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)