Edited: compare two files and print mismatch

kingpeejay · June 21, 2009, 4:04pm

Using unix shell script, how to compare two files and print lines with mismatch? Below are the requirements:

The number of lines on the two files is not the same.
The difference/mismatch can be found on the second or third column.
The comparison is not between line 1 of file 1 and line 1 of file 2. Rather, the comparison is on line 1 of file 1 and the line on file 2 that has the same first word on the line 1 of file 1.

To demonstrate:
FILE 1:
abc 123 678
def 456 901
ghi 789 234
jkl 012 567
mno 345 890
FILE 2:
def 456 901
abc 124 678
mno 345 890
ghi 789 244
OUTPUT FILE:
"from file 1"
abc 123 678
ghi 789 234
"from file 2"
abc 124 678
ghi 789 244

i hope someone can help me with this. Thanks!

King_Kalyan · June 21, 2009, 6:40pm

Try this..

awk '{ if (FNR==NR) {arr[$1]=$0;next}
if (($1 in arr) && ($0!=arr[$1])) { f1[$1]=arr[$1]; f2[$1]=$0; next} } 
END { print "from file 1";  for (i in f1) {print f1}; print "from file 2"; for (i in f2) {print f2}  } ' file1 file2 > file3

Assumption: First word is unique in a given file. please let me know if you need to handle duplicates also so that I can try for that.

kingpeejay · June 22, 2009, 11:56pm

hi king! the code you gave me doesn't work..
it doesn't print the lines with difference though i am sure that there are lines that have mismatch on the two files.

Thanks!

rakeshawasthi · June 23, 2009, 12:08am

What have you tried?

kingpeejay · June 23, 2009, 12:21am

i tried diff command But i learned that in diff command, it compares line by line.
while in cat file1 file2 | sort | uniq -u > file 3, it yileds:

ABC 123
ABC 321
DEF 412
DEF 124

and when i used it on my script, it yields a odd number of lines.

rakeshawasthi · June 23, 2009, 12:51am

Is this what you want...

sort file1 > file1_tmp
sort file2 > file2_tmp
sdiff file1_tmp file2_tmp | grep '|'

output:

abc 123 678                                                     |  abc 124 678
ghi 789 234                                                     |  ghi 789 244

kingpeejay · June 23, 2009, 1:03am

output should be:

"FROM FILE 1"
ABC 123
ABC 321

"FROM FILE2"
DEF 412
DEF 124

rakeshawasthi · June 23, 2009, 1:04am

Why dont you give it a try... I have given you the output, you just have to format it.

kingpeejay · June 23, 2009, 2:43am

I got it! I used the 'sort'. Thanks thanks!

rakeshawasthi · June 23, 2009, 4:23am

Great...

King_Kalyan · June 23, 2009, 4:35pm

Did you check content in file3 because I redirected output to file 3?
I tested the code with the inputs you have given and I could match your output also..

If you want the result to be printed on the screen then remove the "> file3" part from the code and run it..

kingpeejay · June 24, 2009, 6:22am

@ king kalyan: it worked! but i have to sort 1st. Thanks for the help

shaliniyadav · July 1, 2009, 7:55am

rakeshawasthi:

Is this what you want...

sort file1 > file1_tmp
sort file2 > file2_tmp
sdiff file1_tmp file2_tmp | grep '|'

output:

abc 123 678                                                     |  abc 124 678
ghi 789 234                                                     |  ghi 789 244

This code works in case of difference is found how about those records tht are missing that is also part of it right??
For Ex:

File1

abc 123 678
abc 112 111
xyz 100 000

File2

abc 123 678
abc 112 112

Output of above code will be:

abc 112 111 | abc 112 112

But Output shud be

abc 112 111 | abc 112 112
xyz 100 100 |

Please help with that code....... As xyz 100 100 is also a difference as it is missing in 2nd file....

rakeshawasthi · July 1, 2009, 8:54am

hmm...
that was limited to my understanding of king's problem.
You can very well Try... No, sorting, no formatting...

$ grep -v -f file1 file2
abc 112 112
$ grep -v -f file2 file1
abc 112 111
xyz 100 000

shaliniyadav · July 2, 2009, 8:00am

rakeshawasthi:

hmm...
that was limited to my understanding of king's problem.
You can very well Try... No, sorting, no formatting...
$ grep -v -f file1 file2
abc 112 112
$ grep -v -f file2 file1
abc 112 111
xyz 100 000

@rakesh,

Nope i was looking in same format as before..... the one using sdiff code worked fine except for records which are missing...
And its not file1 against file2 or vice verse.. Have to compare both files.. See example belore again:
File1

abc 1 1 1
abc 2 2 2
abc 3 3 3
abc 5 5 5

File2

abc 1 1 1
abc 2 1 2
abc 3 3 3
abc 4 4 4

Output:

abc 2 2 2 | abc 2 1 2
| abc 4 4 4
abc 5 5 5 |

rakeshawasthi · July 2, 2009, 9:24am

Not sure...
may be compareIt can do it... never worked with that.

kingpeejay · July 19, 2009, 8:52am

hi all!

i just want to ask fo rhelp regarding this... the requirements is the same as the original problem stated. the only difference is that that the comparison is only on 2nd and third columns.

thanks!