AWK Matching Fields and Combining Files

Hello!

I am writing a program to run through two large lists of data (~300,000 rows), find where rows in one file match another, and combine them based on matching fields. Due to the large file sizes, I'm guessing AWK will be the most efficient way to do this. Overall, the input and output I'm looking for is similar to to this:

File1: first three columns are coordinates in (x, y, z)
123 456 678 A B C
234 345 567 D F B
234 456 324 H J K
765 432 987 M N K

File2: the last three columns are coordinates in (x, y, z)
45 234 345 567
46 765 432 987
47 111 222 333
48 234 345 567
49 987 765 432
50 444 555 666
51 765 432 987
... and so on

Output file:
45 234 345 567 D F B
46 765 432 987 M N K
48 234 345 567 D F B
51 765 432 987 M N K

File2 has many more entries than File1, and every coordinate in File1 is located somewhere in File2. The problem I am having is how to search through all of File2 finding where each of the individual File1 coordinates is listed, and the number in column 1 of File2 that corresponds to that coordinate.

In a nutshell:
Make new file3
Find where File2($2, $3, $4) is equal to File1($1, $2, $3)
print to file3 File2($1, $2, $3, $4), File1($4, $5, $6)

Thank you!

Use gawk, nawk or /usr/xpg4/bin/awk on Solaris.

awk > file3 'NR == FNR {
  _[$1, $2, $3] = $4 FS $5 FS $6
  next
  }
($2, $3, $4) in _ {
  print $0, _[$2, $3, $4]
  }' file1 file2

If you're really sure that for every key in file2 there is an entry in file1
you can remove the ($2, $3, $4) in _ conditional expression.

was bit late than radoulov:)

awk 'FILENAME=="file1"{a[$1$2$3]=$4" "$5" "$6}
FILENAME=="file2"{if(a[$2$3$4]){print $0" "a[$2$3$4]}}' file1 file2 > file3

First create a sorted copy of file1 and file2 then use join to create your output file3

sort -nk 1,1 file1 > file1_sorted
sort -nk 2,2 file2 > file2_sorted

join -t" " -1 1 -2 2 -o 2.1,2.2,2.3,2.4,1.4,1.5,1.6 file1_sorted file2_sorted > file3

Thank you very much for all the help. I'm a first-timer on the UNIX/LINUX forums, and definitely plan to come back when/if (okay, let's be honest - when) I need help again.

What fast replies, I was sure I'd have to wait until Monday for a response. I've tried all three of your suggestions and they all work beautifully!

Thank you again and enjoy your Sunday (or what's left of it)!

Just a suggestion if you have to perform this function often test each solution using 'time' to see which one is more efficient.

sort -nk 1,1 file1 > file1_sorted
sort -nk 2,2 file2 > file2_sorted

time join -t" " -1 1 -2 2 -o 2.1,2.2,2.3,2.4,1.4,1.5,1.6 file1_sorted file2_sorted > file3