Directory comparison

I'm using Ubuntu 10.10 and I need something to compare the two directories both in terms of files and sub directories.

There is about 1.5TB of files so I need to know if any of the files aren't in one that's in the other. One of these directories is actually a network share if that makes any difference.

Just to be clear I don't need to compare the file content. It's more that the sub directory structures and their content are the same.

I'd also like to be able to generate some hash / CRC code for the directory based on it's content.

Does anyone have any suggestions on what to use?
6 answers Last reply Best Answer
More about directory comparison
  1. Not sure about the second part of your question, but here's a link that should help with the compare.
  2. That link looks great. I'll try it once I have access to my system.

    With regards to the second part, what I meant is being able to generate a unique identifier based on a directory and all of it's contents. I've found this so I think I'll be able to do what I'm looking for with an MD5 hash via the md5sum command. My motivation for this is that I can then compare the hashes for the two directories to check if there are any differences. It won't tell me what those differences are though and that's what the first part is for.
  3. Best answer
    Yeah, I got what you meant, but I just wasn't too sure how it could be done. Your link appears to provide a means of getting a set of md5 hashes for all the files in the directory, but not a single hash for the directory as a whole.

    But it gives me an idea. Why not pipe the output of the "tree" command into md5sum; that way you get a hash of the directory listing. That's going to change is any of the directory contents change, but not if the contents of any of the files do. If I understand correctly, that's what you want.

    If you actually want to watch the contents of the files as well, you could pipe the output of the comand that you linked to through md5sum. That would change for any change in the directory structure or the contents of any of the files.
  4. Whoops! You are completely right. The link I provided generates multiple md5 hashes.

    Hashing the tree seems like a good method to check structures.

    Doing another md5 of the concatenated collection of the md5 hashes seems as good a method as any to check structure and file changes.

    Thanks I've give this a shot later.
  5. I gave this a go and for 1.5TB the double MD5 hash method just isn't scalable!

    The tree hash worked out well though thanks.
  6. Best answer selected by Rusting In Peace.
Ask a new question

Read More