awk comparision between 2 files and substitution in third

Hi All,
I have two files in the following format.

File 1 :

1044|1|20121031|2910039.4|MR|201210|G1044|E
1082|2|20121031|1664662.84|MR|201210|G1082|E
1696|3|20121031|190801.5|MR|201210|G1696|E
1824|4|20121031|196350|MR|201210|G1824|E
1900|5|20121031|221447.8|MR|201210|G1900|E

File 2 :

S00025106|1044|0|9.6|0|0| |0|0|0|0|0| |0|163.5|0|0|
S00026461|1082|0|-6.1|0|0| |0|0|0|0|0| |0|204|0|0|
S00026462|1696|0|-6.1|0|0| |0|0|0|0|0| |0|102|0|0|
S00029844|1824|0|2.2|0|0| |0|0|0|0|0| |0|123.1|0|0|
S00030343|1900|0|2.2|0|0| |0|0|0|0|0| |0|156.1|0|0|

Output desired :
File 3 :

S00025106|1044|0|9.6|0|1| |0|0|0|0|0| |20121231|163.5|0|0|
S00026461|1082|0|-6.1|0|2| |0|0|0|0|0| |20121231|204|0|0|
S00026462|1696|0|-6.1|0|3| |0|0|0|0|0| |20121231|102|0|0|
S00029844|1824|0|2.2|0|4| |0|0|0|0|0| |20121231|123.1|0|0|
S00030343|1900|0|2.2|0|5| |0|0|0|0|0| |20121231|156.1|0|0|

Basically I want to compare column 1 from File 1 and column2 from File 2. If they match , I want to replace column 6 and column 14 of File 2 with Coulmn 2 and Column3 of File1 giving File3,

Any help is really appreciated.

1 Like

Hey,

Try sth like this,

awk -F'\|' 'FNR==NR{a[$1]= $2"^"$3;next;}a[$2]{split(a[$2],b,"^");$6=b[1];$14=b[2];}1' OFS="|" file1 file2

Cheers,
Ranga:)

1 Like

Small variation with two single arrays:

awk 'NR==FNR{A[$1]=$3; B[$1]=$2; next} $2 in A{$6=B[$2]; $14=A[$2]}1' FS=\| OFS=\| file1 file2
2 Likes

An approach using join

join -t"|" -1 2 -2 1 -o 1.1 1.2 1.3 1.4 1.5 2.2 1.7 1.8 1.9 1.10 1.11 1.12 1.13 2.3 1.15 1.16 1.17 1.18 file2 file1
1 Like

@Scrutinizer :Could you please explain command used using the AWK associate arrays here . I am new bie to awk arrays.Plz help

Sure:

awk '
  NR==FNR{                       # When the first file is being read (only then are FNR and NR equal) 
    A[$1]=$3                     # create an (associative) element in array A with the first field as the index and the 3rd field as value
    B[$1]=$2                     # create an (associative) element in array B with the first field as the index and the 2rd field as value
    next                         # start reading the next record (line)
  } 
  $2 in A{                       # while reading the second file, if field 2 is present in array A (we could also have chosen B)
    $6=B[$2]                     # then set field 6 to the element in array B with the second field as the index
    $14=A[$2]                    # and set field 14 to the element in array A with the second field as the index
  }
  1                              # print the record (line)
' FS=\| OFS=\| file1 file2       # Set the input and output field separator to the pipe symbol and read file 1 and then file 2

Hope this helps...

1 Like

@Scrutinizer : Thank you so much .. I got clear idea why we use NR=FNR. Thanks once again .

If they match , I want to replace column 6 and column 14 of 
File 2 with Coulmn 2 and Column3 of File1 giving File3

What do you want to do if they do not match? I assume print line from File 2 unchanged? But you know what they say about assumptions! :smiley: