Differences between 2 Flat Files and process the differences

Hi
Hope you are having a great weeknd !! I had a question and need your expertise for this :

I have 2 files File1 & File2(of same structure) which I need to compare on some columns. I need to find the values which are there in File2 but not in File 1 and put the Differences in another file which I need to process

File 1

Field1 Field2 Field3 Field4 Field5 Field6
1      2      3      4      5      6
1      23     3      43     12     15

File 2

Field1 Field2 Field3 Field4  Field5 Field6
1      2      3      4       5      6
1      23     3      43      5      6 (not in diff field 5& 6 have changed which are not in compare condition)
1      23     3      12      5      6
31     43     54     5       7      8

Diff File

Field1 Field2 Field3  Field4  Field5 Field6
31     43     54      5       7      8 (in diff as its in File2 and not in File 1)
1      23     3       12      5      6 (in diff as its in File2(field4 has changed) and not in File 1)

So I need to compare only for the first 4 fields (i.e if any of those 4 fields are different then we need to pu them in the diff file and process it)

Any Help will be greatly appreciated!!

Thanks
J

Hi

awk 'NR==FNR{a[i++]=$1" "$2" "$3" "$4;next;}{x=$1" "$2" "$3" "$4; for (j in a){if (a[j] == x)next;}}1' i=1 j=1 file1 file2 > diff_file

Guru.

Hi Guru
Thanks for the quick answer...Not well versed with awk so wanted to confirm my understanding...

basically when you say:

'NR==FNR{a[i++]=$1" "$2" "$3" "$4;next;}

you are taking the first file and putting all its data into array a
[i]so a[1] will have first row of the first file so you insert the whole data of the first file in a

[i]now when you say

{x=$1" "$2" "$3" "$4; for (j in a){if (a[j] == x)next;}}1'

this is the part where NR not equal to FNR ie you start reading the second file you assign X with the first row of the second file then you use the for loop to compare each row of first file with the row in the second file if its equal then you continue to the next row otherwise you put the entry into diff file .This you do for each row in the second file

i=1 j=1 file1 file2 > diff_file

this part you have initialized i,j

It will be great if you could confirm...It would be a great help

Thanks

Hi
You got it right except for the first part. In the array, we are not storing the entire row, only the first 4 numbers of every row since you are interested in comparing the first 4. Same holds good for x as well.

Guru.

Hey Guru

Sorry for the confusion but
what I meant to ask was that

'NR==FNR{a[i++]=$1" "$2" "$3" "$4;next;}

by this you store first 4 columns of the whole file(all rows)

and then you move to the next

{x=$1" "$2" "$3" "$4; for (j in a){if (a[j] == x)next;}}1'

where you store the first 4 columns of each row of second file in x and compare them against the all rows(but first 4 columns only) one by one

what does this 1' means and after executing this awk script the final
file that we will have only 4 columns or the whole structure of the file
(it should compare only the first 4 columns but in the final file should have all 6 columns)

Thanks
J

Something like this?

awk 'NR==FNR{a[$1 $2 $3 $4]=$0; next}
!($1 $2 $3 $4 in a)' file1 file2 > diff_file