I have 2 large file (.dat) around 70 g, 12 columns but the data not sorted in both the files.. need your inputs in giving the best optimized method/command to achieve this and redirect the not macthing lines to the thrid file ( diff.dat)
what constitutes "non-matching" lines?
Entire line or some key fields in file1 and 2 to match on?
You have to be clearer with your requirement statements.
Also, please use code tags when posting code/data samples.
sounds about right.
Just remember - whatever you do, comparing 60G files will be slow...
Test this on a smaller chunks to see if you're getting the desired results first.
I'd be tempted to look at comm -3 ${file1} ${file2} this will suppress lines common to ${file1} and ${file2} later versions of comm don't require the files to be sorted.