Tool to search, list out distinct .sql files

  • Is there any tool which can scan through each script and spit out list of scripts which are same?

  • WinMerge, which is free, can compare folders vs folders or files vs files, and identifiy differences:

    http://winmerge.org/

    will that do what you want?

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (8/6/2013)


    WinMerge, which is free, can compare folders vs folders or files vs files, and identifiy differences:

    http://winmerge.org/

    will that do what you want?

    Actually i do use winmerge, basically i have 10 scripts, out of these there is possibility that many scripts might have same code. I just need to know which one's are same, instead of me doing a 1 to many comparison for each script.

  • curious_sqldba (8/6/2013)


    Lowell (8/6/2013)


    WinMerge, which is free, can compare folders vs folders or files vs files, and identifiy differences:

    http://winmerge.org/

    will that do what you want?

    Actually i do use winmerge, basically i have 10 scripts, out of these there is possibility that many scripts might have same code. I just need to know which one's are same, instead of me doing a 1 to many comparison for each script.

    Although there are better tools for this, it is possible to check for duplication and even produce a count of which scripts are "identical" using T-SQL as a "hammer".

    The first question would be, what do you mean "might have same code"? If you mean they are absolutely identical including any white space, control characters (Cr, Lf, Tab, etc), and casing, then you could load the scripts into a table, use HASHBYTES to produce a hashcode for each, and then compare the hashcodes. If you're using a case-insensitive collation, then casing won't matter.

    If the code has differences in white space or control characters, then you'll need to replace the control characters with spaces and de-duplicate the spaces prior to the HASHBYTES conversion.

    Also keep in mind that the above process does not strip out comments. If the comments differ, so will the HASHBYTES conversion. To remove the comments would be very much more difficult to do in T-SQL. It could be done but then you'd have to evaluate how long it would take to develop such a thing compared to doing the task manually. Even such a simple difference as someone including a single semi-colon where the others have not will cause a difference in the HASHBYTES. Of course, you could treat semi-colons as if they were control characters in this case.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply