How to compare files in two folders using cmp?

i recently copied 400GB of data from a NTFS drive to a ext4 drive. I want to verify that the data is 100% identical to the original.

I wanted to use cmp but it only does two files. The directory that was copied contains many subdirectories and all sorts of files (not just text).

So I guess what I want to know is:

  1. Is CMP the right command to use to compare (verify identical data) of two files regardless of its type? eg video, document, executable, zip, encrypted file etc

  2. How would you write a quick shell script to compare two folders?

So far i thought of two ways:

  1. supply the directories of each:
    Source: /media/ntfsdrive/data
    New Copy: /media/ext4drive/data

But this seems difficult.

  1. run find . in each directory (source and the new copy) - should b exactly the same if you output to a file and do a diff.
    some how run CMP for each listing.

This seems difficult too but some what easier. I have no shell programming experience though, so does anyone know a quick way of doing this? Any help would be highly appreciated. Thank you!

Yes, 'cmp' does a byte by byte comparison of two files.

One way would be ...

# Build first the lists to be compared:

	find /media/ntfsdrive/data/ /media/ext4drive/data/ -type f  |
               awk '/\/media\/ntfsdrive\/data\// { print > "list_1" }
                    /\/media\/ext4drive\/data\// { print > "list_2" }' 
	
# Do the actual comparison with 'cmp', ( assuming no whitespaces in the filenames ):

	 awk '{ getline s < "list_2"; print "cmp", $0, s }' list_1 | sh

Note that one list's filenames have to be in precise order with their counterparts in the other list, for 'cmp' to work correctly.

man find (Linux) plus man md5sum (Linux) should do the trick:

cd /media/ntfsdrive/data; find . -type f -exec md5sum {} \+ | ( cd /media/ext4drive/data; md5sum -c --quiet )

If both directory trees are intended to be identical (no files in one that are not in the other):

diff -r /media/ntfsdrive/data /media/ext4drive/data

If both trees are identical, diff will be silent and not generate any output; its exit status will be zero.

Regards,
Alister

---------- Post updated at 07:07 PM ---------- Previous update was at 06:53 PM ----------

Another option using find and cmp which ignores any extra files at the destination tree (so long as each file was copied correctly we're happy):

find /media/ntfsdrive/data -type f -exec sh -c '
    for f; do
        cmp "$f" /media/ext4drive"${f#/media/ntfsdrive}"
    done
' sh {} +

Regards,
Alister

Alistair, i wanted to use diff command because it can do things recursively, but isnt that only for text files? I mean it says it does a line by line comparison, where cmp does byte by byte. I am dealing with alot of non-text files as well.

It can compare binary files, though it may not actually generate a diff if they differ; it may simply print the two filenames and the word "differ" to let you know, along with returing a non-zero exit status.