Matching information from different files

I have a dataset with coordinates of certain points with their names.

file1:

AAA 5 7
BBB 7 2
CCC 4 4
DDD 4 4
EEE 8 9
FFF 9 9
GGG 4 9
HHH 32 1
III 7 6
JJJ 12 4
KKK 12 3
MMM 15 3

The letters represent names of points. In file1 they are unique.
I also have file2 with a dataset with some points that might be repeating unknown number of times, for which I need coordinates from file1 (coordinates are columns 2 and 3 in file1)
Not all points from file1 are in file2 and viceversa.

file 2:

AAA 1 -9.95 0.03 22.80 0.04 10.37 0.03
AAA 2 -9.95 0.03 22.80 0.04 10.37 0.03
AAA 3 -9.96 0.03 22.79 0.04 10.37 0.03
BBB 1 -40.34 0.04 2.67 0.04 48.91 0.04
BBB 1 -40.33 0.04 2.67 0.04 48.92 0.04
CCC 1 -19.37 0.04 16.77 0.04 18.46 0.03
DDD 1 -13.88 0.03 19.20 0.04 11.90 0.03
DDD 2 -13.88 0.03 19.20 0.04 11.89 0.03
DDD 3 -13.88 0.03 19.20 0.04 11.89 0.03
EEE 1 -15.97 0.03 -3.95 0.04 4.23 0.03
EEE 2 -15.97 0.03 -3.95 0.04 4.23 0.03
EEE 3 -15.97 0.03 -3.95 0.04 4.23 0.03
EEE 4 -15.97 0.03 -3.95 0.04 4.23 0.03
EEE 5 -15.97 0.03 -3.95 0.04 4.23 0.03
FFF 1 -39.11 0.04 -5.52 0.04 54.45 0.04
FFF 2 -39.11 0.04 -5.52 0.04 54.45 0.04
GGG 1 -21.55 0.03 -5.42 0.03 6.93 0.04
HHH 1 -14.83 0.03 0.52 0.04 -4.74 0.03
HHH 2 -14.83 0.03 0.52 0.04 -4.74 0.03
III 3 -14.84 0.03 0.52 0.04 -4.74 0.03
JJJ 1 -10.57 0.06 1.03 0.17 2.35 0.09
KKK 1 8.04 0.04 -11.54 0.04 -4.02 0.10
KKK 1 -17.40 0.04 -1.58 0.04 14.42 0.04
LLL 1-5.72 0.06 -1.54 0.06 13.93 0.04
LLL 2 -2.31 0.04 -3.73 0.05 12.45 0.04
LLL 3 -2.31 0.04 -3.73 0.05 12.45 0.04

wanted output:

AAA 1 -9.95 0.03 22.80 0.04 10.37 0.03 5 7
AAA 2 -9.95 0.03 22.80 0.04 10.37 0.03 5 7
AAA 3 -9.96 0.03 22.79 0.04 10.37 0.03 5 7
BBB 1 -40.34 0.04 2.67 0.04 48.91 0.04 7 2
BBB 1 -40.33 0.04 2.67 0.04 48.92 0.04 7 2
CCC 1 -19.37 0.04 16.77 0.04 18.46 0.03 4 4
DDD 1 -13.88 0.03 19.20 0.04 11.90 0.03 4 4
DDD 2 -13.88 0.03 19.20 0.04 11.89 0.03 4 4
DDD 3 -13.88 0.03 19.20 0.04 11.89 0.03 4 4
EEE 1 -15.97 0.03 -3.95 0.04 4.23 0.03 8 9
EEE 2 -15.97 0.03 -3.95 0.04 4.23 0.03 8 9
EEE 3 -15.97 0.03 -3.95 0.04 4.23 0.03 8 9
EEE 4 -15.97 0.03 -3.95 0.04 4.23 0.03 8 9
EEE 5 -15.97 0.03 -3.95 0.04 4.23 0.03 8 9
FFF 1 -39.11 0.04 -5.52 0.04 54.45 0.04 9 9
FFF 2 -39.11 0.04 -5.52 0.04 54.45 0.04 9 9
GGG 1 -21.55 0.03 -5.42 0.03 6.93 0.04 4 9
HHH 1 -14.83 0.03 0.52 0.04 -4.74 0.03 32 1
HHH 2 -14.83 0.03 0.52 0.04 -4.74 0.03 32 1
III 3 -14.84 0.03 0.52 0.04 -4.74 0.03 7 6
JJJ 1 -10.57 0.06 1.03 0.17 2.35 0.09 12 4
KKK 1 8.04 0.04 -11.54 0.04 -4.02 0.10 12 3
KKK 1 -17.40 0.04 -1.58 0.04 14.42 0.04 12 3

I want to add to points in file2 the coordinates from file1. Obviously for every instance of AAA in file2, the coordinates will be the same, however, the rest of the line for AAA is not necessarily the same.

Since the fields in file2 repeat I don't know how I would match and paste the values from columns 2 and 3 in file1 to appropriate rows of file2.
If they weren't repeating I'd try with awk something like

awk '{k=$1} NR==FNR{a[k]; next} (k in a)' file1 file2 

but that doesn't work here and I don't know how to start.

You may want to start reading the man page of the join command. As i see it it does exactly what you want. Note that join expects both input files to be sorted (in regards to the common key you want to use for the join-operation) prior to using them.

I hope this helps.

bakunin

1 Like

Or try this modification:

awk '{k=$1} NR==FNR{a[k]=$2 OFS $3; next} k in a {print $0, a[k]}' file1 file2