Finding first difference between two files

Mojing · February 9, 2013, 3:24am

Hi!
I'd like to know if it is possible for a command to find the first difference between two large files, output that line from both file and stop, so no need to continue after that to save some computation time.

I don't think looping through it will be efficient enough but that's the only thing i can think of...

Better: Aside from outputting the different line, it would be better if it could output the preceding line too.

Thanks!!!!!

Jotne · February 9, 2013, 4:09am

Did you try diff

diff file1 file2 | head -n 2

List where the first difference are and print the difference.
Lots of option to test.

Mojing · February 9, 2013, 4:15am

hmm I see, but it still will go through the whole file, which is like a 1GB text file...
thanks though

---------- Post updated at 04:15 AM ---------- Previous update was at 04:14 AM ----------

I think cmp will do, I'll just get the line info and grep some text out

Jotne · February 9, 2013, 4:26am

I am 100% sure you can do this easy with awk , but I have not worked much with array .
There you can add an exit if diff found.

Scrutinizer · February 9, 2013, 4:52am

A quick start would be:

awk 'getline p<f && p!=$0 {print "Line " NR ":" RS $0 RS p; exit}' f=file2 file1

But you would need to add provisions for if file1 has more lines than file2 or vice versa

Mojing · February 9, 2013, 4:58am

I have a new issue actually,
So what I want to do is compare two files such as:

file1.txt
A B C D E F G
X 34234 324234
A B C D E F Z
A B C D E F Z
X 34234 0
...

file2.txt
A B C D E F Z
X 34234 324234
A B C D E F Z
A B C D E F E
X 34234 1
...

I want it to ignore difference with lines starting with A and only comparing lines starting with X for example.
I know that I can filter out all the A lines, but I need to keep them in the files as I have to look back at that line A that was preceding the line X with the difference.
So the output should be like, the two files differs at line 5. not at line 1.

I was thinking of something like

cmp file1 file2 and ignore line starting with pattern e

Thanks!!

Scrutinizer · February 9, 2013, 5:46am

With a quick minor adaptation:

awk 'getline p<f && /^X/ && p!=$0 {print "Line " NR ":" RS $0 RS p; exit}' f=file2 file1

But now there are more exceptions to consider, for example, are the number of A lines allowed to differ and there can be more X lines in file1 than there are in file2 and vice versa. So the script would need to be improved..