Thanks for your time on this, its much appreciated
1) Do both files have exactly the same number of records and are you just looking for records which have changed? Does the order of the output into file3 matter?
File1 has 1803077 records
file2 has 1795370 records
2) If there can be more or less records in file2 than file1, does the order of the output into file3 matter?
I would prefer 1st row in file3 from file1 and 2nd row from file2 and so on
Are you also interested in records which exist in file1 but do not exist in file2?
Yes, and viceversa also, it would be good if we can copy the records to diffrent files say recordsonlyonfile1.txt and recordsonlyonfile2.txt
3) What percentage of differences do you expect? (This is really a performance question because some approaches would involve multiple lookups).
there are huge changes in the file it could be over 50%
4) If this proves too difficult for shell programming, do you have a mainstream database engine?
I have informix database I am not sure if this would not help me as there is no uniq key in the records
---------- Post updated at 15:05 ---------- Previous update was at 14:20 ----------
One shell approach if the order of the output does not matter.
Tried with two approx 5 million record files of 500 Mb each. Took about 5 mins to run and the output only shows the mismatched records from file2. Actual performance will depend on how fast you computer is and how much memory you can give to sort.
#!/bin/ksh
cat file1 | sort > sortfile1
cat file2 | sort > sortfile2
comm -13 sortfile1 sortfile2
When sorting large files be sure to set $TMPDIR to somewhere with enough space for at least twice the size of the file being sorted.
[/quote]