gawk HELP

I have to compare records in two files. It can be done using gawk/awk but i am unable to do it. Please help me

File1

ABAAAAAB BC asa sa
ABAAABAA BC bsa sm
ABBBBAAA BC bxz sa
ABAAABAB BC csa sa
ABAAAAAA BC dsa sm
ABBBBAAB BC dxz sa

File 2
ABAAAAAB BC aas ba
ABAAAAAB BC asa sa
ABAAABAA BC ban sm
ABBBBAAA BC bxz sa
ABAAABAB BC csa sa
ABAAAAAA BC dsa sm
ABBBBAAB BC dxz sa
ABBBBAAB BC fxz sa

How should files be compared

  1. We should take 3rd and 4th fields(eg asa sa in file 1 line 1) in first file and look for it in second file's third and fourth columns.
  2. If 3rd and 4th filed of first file matches with 3rd and 4th field of some record of 2nd file then we need to compare rest of line..like comapring ABAAAAAB BC in file 1 with ABAAAAAB BC in file 2 for asa sa.
    3.If there is a mismatch then we need to give the output that mismatch has occured else No Error.

Please help in this regard

-Sandeep

nawk -f sandeep.awk file2 file1

sandep.awk:

FNR==NR {arr[$3 FS $4] = $1 FS $2; next}
{
   if ( !($3 FS $4) in arr )
      print "MEGAmismatch"
   else if ( ($1 FS $2) != (arr[$3 FS $4]) )
      print "mismatch"
}

The rest is left up to the OP to figure out - not tested.

It is not printing MEGAmismatch at all some problem is there.

sorry - as I said - 'not tested':

FNR==NR {arr[$3 FS $4] = $1 FS $2; next}
{
   if ( !( ($3 FS $4) in arr) )
      printf("[%s]: MEGAmismatch of [%s]\n", FNR, $3 OFS $4)
   else if ( ($1 FS $2) != (arr[$3 FS $4]) )
      printf("[%s]: mismatch of [%s] on MEGAmatched [%s]\n", FNR, $1 OFS $2, $3 OFS $4)
}

Thnaks alot !!!

I could not understand this line of code can u please explain me what exactly it is meant for and how is it happening:

FNR==NR {arr[$3 FS $4] = $1 FS $2; next}

'FNR==NR' will be true for the FIRST file to be processed

'arr' is an associative array indexed by the values of fields '3' and '4' and having the content of cancatenated value of fields '1' and '2'.

When processing the FIRST file specified on the command line we're building the hash/associative array used later on in the script for doing the 'lookups'.

Look into 'man nawk' for the details on the associative arrays.

Thank you so much .

Regards
sandeep