This code below is working with only with few records, but the problem is when I run with a large file like the the file is 2million records the result is theysame with file1
Question:
Why is it happen that it works in few records but not in more than 2million records?
Is it the command set it to timeout?
What could be the solution on this?
The comm and sort are large data stable, where grep gets slower with more stored lines and may blow up if it hits a 4G address limit putting file2 into VM. grep also has to check for regex at some stage, a waste on pure data if not a threat to data integrity; fgrep / grep -F is faster and more data-stable.
Awk and bash can hash search, which does not have speed problems with large files and can save the sorting step, but still has to put file2 into VM.
Since diff does not assume order, it will search around for missing lines, even half heartedly, which might not scale well, performance-wise. It should be durable with large files, though.
>> because as I said its a 2million records.
2 million record going to take time and depending on the horse-power of the system, unless you figure out a better code, that may speed it up.