awk comparison using multiple files

Hi,

I have 2 files, I need to use column of file1 and do a comparison on file2 column 1 and print the mismatch is file3 as mentioned below.

Kindly consider that file 1 is having uniq key(column) whereas in file2 we have multiple duplicates (like 44). These duplicates should not come in output of file 3 but should be routed to a new file4.

file1:

1,apple  
2,mango  
3,banana  
44,orange  

file2:

1,apple  
22,  
31,xyz  
2,man  
3,banana  
44,oran   
44,orange

The expected output to file file3 :-

2,mango,man

and in file4 we should capture duplicates :-

44,oran   
44,orange

Through different forum i got the cmd as {

awk 'BEGIN{FS=OFS=","}($1 in a) && a[$1]!=$2{print $1,a[$1],$2}{a[$1]=$2}' file1 file2 >> file3

}

for file 3 generation but it is not working fine with duplicates.

Welcome to the forum.

Please become accustomed to carefully phrase your question / request. There's some guesses necessary to understand it:

  • "use column of file1" means column 1, doesn't it?
  • And, "print the mismatch" means mismatch between fields 2 in the files?
  • Duplicates should be printed regardless of matches ("orange") or mismatches ("oran")?

Howsoever, see if my assumptions are correct and try

awk -F, -vOFS="," '
NR == FNR       {T[$1] = T[$1] ORS $0
                 next
                }

$1 in T         {sub ("^" ORS, "", T[$1])
                 n = split (T[$1], X)
                 if (n == 2)    {if ($2 != X[2]) print $0, X[2] > "file3"
                                }
                   else          print T[$1]  > "file4" 
                }
' file2 file1
1 Like

Thanks its working as per expectation.