Huge File Comparison

naveenn08 · February 17, 2010, 11:52pm

Hi i need to compare two fixed length files and produce the differences if any to a seperate file. I have to capture each and every differneces line by line. Ideally my files should not have any differences but if there are any then it should be captured without any miss. Also my files sizes are very huge say more than 2 GB.

Please help me with a code in either awk or shell script which does this huge file comparison with increase in some performance.

Regards,
Naveen

linuxpenguin · February 18, 2010, 2:19am

Have you tried the diff command?

diff file1 file2

I have not tried it for files as huge as 2GB, but dont think it will be very different. In case if that does not work, I can suggest that you split the 2 files using the split command and run the diff command on the smaller splits. I have used split with files as large as 10GB and it has barely taken seconds to split it.
Another advantage you may get with split is, you can run diff on more than one pair of file at the same time.

Hope that helps.

naveenn08 · February 18, 2010, 3:07am

I know i can use the diff command for this purpose. Since the file size is going to be very huge i thought of using the bdiff option too but before than wanted to check if there are any other options available which makes this comparison faster and accurate.

Regards,
Naveen

binlib · February 18, 2010, 5:51pm

cmp (with option -l) may be what you want.

methyl · February 19, 2010, 7:40am

If the files are normally the same I'd run the checksum program "cksum" on each file first and compare the results. This is the quickest way to prove whether two files are identical.
If they are not identical then actually run a command to compare them.

Whether you can use any standard unix commands to compare files which are larger than 2Gb depends on your Operating System and version.

We seem to be assuming that these are unix standard text files. Are they text files? If not, what software was used to create the files?