Print matching fields (if they exist) from two text files

Hi everyone,
Given two files (test1 and test2) with the following contents:
test1:

80263760,I71
80267369,M44
80274628,L77
80276793,I32
80277390,K05
80277391,I06
80279206,I43
80279859,K37
80279866,K35
80279867,J16
80280346,I14

and test2:

80263760,PT18
80279867,PT01

I need to do some kind of join to obtain this result:

80263760,PT18
80267369,M44
80274628,L77
80276793,I32
80277390,K05
80277391,I06
80279206,I43
80279859,K37
80279866,K35
80279867,PT01
80280346,I14

In short:

  1. Join on 1st field of both files.
  2. If a value in field #1 exists in test2, replace field #2 in test1 with field #2 in test2.
  3. Else, leave field #2 in test1 intact.

I tried with the following join command:

join -t, -1 1 -2 1 -o 1.1 2.2 -a 1 -a 2 test1 test2

but it only complies with requirement 2 above and leaves non-matching fields empty:

80263760,PT18
80267369,
80274628,
80276793,
80277390,
80277391,
80279206,
80279859,
80279866,
80279867,PT01
80280346,

Any ideas will be more than welcome.
Thanks in advance.

1 Like

A standard way to solve this (you will find many examples on these forums) is this awk construct, so give that a try:

awk 'NR==FNR{A[$1]=$2; next} $1 in A{$2=A[$1]}1' FS=, OFS=, test2 test1
1 Like

Yes: works as designed. :wink:

The join -command follows your orders to include non-matched lines (the "-a") and therefore you see these lines in the output. Depending on what exactly you want leave out "-a 1" or "-a 2" or both. Add a "1.2" to your output rules as fallback.

I hope this helps.

bakunin

2 Likes

Thank you guys for your answers! I was able to solve my problem using your suggestions.