Detecting Differences in Files on Servers

  • Ed Wagner

    SSC Guru

    Points: 286958

  • Eirikur Eiriksson

    SSC Guru

    Points: 182347

    Thanks Ed for this fine piece, certain that it will be a very handy reference.

    😎

  • Carlo Romagnano

    SSC-Insane

    Points: 21760

    Eirikur Eiriksson (7/30/2015)


    Thanks Ed for this fine piece, certain that it will be a very handy reference.

    😎

    +1

    Here, another dos command to list files, each column is separated by "*":

    @for /R %1 %%I in (%2) do @echo %%~tzI*%%~fI*%%~dpI*%%~nI*%%~xI*>>C:\DiskSize\main.log

    Where %1 is the Folder and %2 is the type of file "*" or "*.htm" or "*.js"

  • iposner

    Ten Centuries

    Points: 1161

    I think that's quite a complex solution to an unnecessary complex problem. I don't see why a report is needed at all - it seems to me that the reporting is only necessary because the existing file replication mechanism across servers is unreliable. To my mind, the core issue is ensuring that the file tree is identical across servers.

    If the aim is to ensure that parts of the directory tree contain files that are the same across servers, there are a number of other ways to crack the nut:

    1) Where you have hundreds of thousands of small files, it's much faster to replace the entire target folder structure by using GnuWin32 tar on the source tree and un-tar'ing it on the destination. This works MUCH faster than just copying the tree across as no confirmation is needed between each copy. If the files are small and highly numerous, it works faster than comparing date-stamps too. It works especially well with compressed images as tar does not attempt to compress the (uncompressible) images again. This method works well across a high-bandwidth, low-latency connection such as a datacentre LAN.

    2) Another mechanism is to use the GnuWin32 md5sum command on files in a directory, which will create a digest of md5sums. This can be copied to the target folder and, using the -c (check) flag, the contents of the folder checked against the digest. The plus here is that the actual file contents are checked without having to move much data across the network, so this option is good where the file sizes are large and ensures that the file contents are identical. The downside is that md5sum does not have a recursive option, so the digest has to be created per folder, which makes it awkward to use on a directory tree. It also will be slower at creating the digest and comparing the digest than using date-stamp comparisons. The upside is that this method will work particularly well where the bandwidth and latencies between servers is poor.

    3) The in-built FC (file compare) command also does file comparisons, but being a single process comparing actual file contents across a LAN, it will be slow.

    4) Powershell has the compare-object command which can be placed in a script to loop though the folder structure. Again it compares file contents. Alternatively, a powershell script can be written to perform a datestamp and filesize comparision, similar to the article.

    5) Finally, trusty robocopy can be used to perform a selective copy of changed files only.

  • Robert Sterbal-482516

    SSCrazy

    Points: 2784

    It would be really neat to see this done with Powershell

  • Luis Cazares

    SSC Guru

    Points: 183546

    Today I was wondering when would your article would come out, then I opened the newsletter and it was there.

    I'll give it a read after a cup of coffee.

    Thank you for sharing.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Ed Wagner

    SSC Guru

    Points: 286958

  • Ed Wagner

    SSC Guru

    Points: 286958

    I forgot to add it to the article, but many thanks to Jeff Moden for being my reviewer, giving me good advice and encouraging me to write it in the first place.

  • Ed Wagner

    SSC Guru

    Points: 286958

    Luis Cazares (7/30/2015)


    Today I was wondering when would your article would come out, then I opened the newsletter and it was there.

    I'll give it a read after a cup of coffee.

    Thank you for sharing.

    Thanks, Luis. Actually, today's the first time I've been online all week. I'm in northern Michigan on vacation and the internet runs slowly up here where I can find it. I did, however, notice you had your article published. I'm looking forward to getting back online later and reading it.

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    Thanks for the article.

  • David Data

    SSCrazy

    Points: 2965

    This does seem a complex way to do it - though it does let you log the results in an SQL table if that's important to you.

    Personally, I use two Windows tools to check or ensure two file trees are identical:

    - Examdiff, which will compare individual files, directories, or entire trees, and

    - SyncBack SE, which will copy sets of files, including entire file trees on the same or different disks. Among other options, it can MIRROR (ensure destination tree is identical) or BACKUP (new/changed files copied but files on destination only are not removed), and can compare files by size and date, or by full content using a cryptographic hash.

    Both programs have free versions, so you can try them out without commitment. I liked both so much I paid the (not very high) price to get the Pro versions. ExamDiff understands lots of file formats including Word, Excel, PDF and ZIP as well as text files. It can be run from the command line if you need to.

  • kyyb7

    Newbie

    Points: 7

    I use Beyond Compare for tasks like this.

  • Andy Warren

    SSC Guru

    Points: 119676

    I think the comment about forcing them to match is interesting. Ed, is there a reason to NOT do that in your case?

    There's merit to putting the list of files into a table. I run a number of daily checks for various problems (NDF restored to wrong folder for example) and it's been nice to figure that stuff out by just writing the query in TSQL (max comfort level). It's been handy for various adhoc questions about the file system over time.

  • Jeff Moden

    SSC Guru

    Points: 994558

    Ed Wagner (7/30/2015)


    I forgot to add it to the article, but many thanks to Jeff Moden for being my reviewer, giving me good advice and encouraging me to write it in the first place.

    I don't feel slighted in the least, Ed. In fact, IIRC, since it's your very first article, I believe I suggested just leaving any credits for those things off. This is a complex article for a "beginning writer" and you did a great job.

    Shifting gears a bit, I agree that there are many ways to skin this particular cat. But to emphasize the background that Ed tried to portray in his article, Ed basically had no budget for this problem and he had to do it quickly because of management demands based, supposedly, on some urgent customer related demands so there wasn't much time to explore different avenues even if they were free. When I say "quickly", I mean virtually "overnight". The reporting was also a critical feature according to management. The article is a great testimony and documentary to the idea that "Before you can think outside the box, you must first realize... you're in a box". 🙂 Since the company was already using Ultra-Edit for other things, it also shows some great innovation without introducing more to the proverbial "Tower of Babel" that so many companies suffer.

    Even better than that, management didn't think he could actually pull it off without buying something else and they certainly didn't think he could pull it off virtually "overnight" even if he were to buy something to help.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Ed Wagner

    SSC Guru

    Points: 286958

    Thanks, Jeff. I definitely consider myself a beginning writer.

    Yes, it was an immediate request. With no budget and very limited time, I used the tools I had at my disposal. As I started it, I considered many ways to approach the problem. I think that, in addition to the individual techniques, the overall article was a way of combining the tools I had available to solve the problem.

    In the end, that's what I really hope people take away from the article. Using the tools you have, you can combine them to accomplish your goal.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply