join files based on a common field

GoldenFire · May 24, 2011, 7:17am

Hi experts,

Would you please help me with this?
I have several files and I need to join the forth field of them based on the common first field.
here's an example...

first file:

280346 39.88 -75.08 547.8
280690 39.23 -74.83 538.7
280729 40.83 -75.08 499.2
280907 40.9 -74.4 507.8
281335 40.73 -74.35 504.8
281351 38.95 -74.93 543.8
281582 41.03 -74.42 465.2
282640 40.82 -74.28 502.2
283029 40.55 -74.87 515.2

second file:

280346 39.88 -75.08 556.6
280734 40.82 -75.08 503.1
280907 40.9 -74.4 516.8
281335 40.73 -74.35 518.0
281351 38.95 -74.93 552.8
281582 41.03 -74.42 489.3
282023 40.65 -74.3 536.5
282768 40.82 -74.28 501.0
283291 39.73 -75.08 547.7

third file:

280346 39.88 -75.08 549.3
280690 39.23 -74.83 533.8
280734 40.82 -75.08 494.7
280907 40.9 -74.4 505.2
281335 40.73 -74.35 509.8
281351 38.95 -74.93 537.1
281582 41.03 -74.42 480.8
282023 40.65 -74.3 530.1
282768 40.82 -74.28 503.3

And I'd like to have this:

280346 39.88 -75.08 547.8 556.6 549.3
280907 40.9 -74.4 507.8 516.8 505.2 
281335 40.73 -74.35 504.8 518.0 509.8
281351 38.95 -74.93 543.8 552.8 537.1 
281582 41.03 -74.42 465.2 480.8 480.8

Many thanks for your help in advance!

Chirel · May 24, 2011, 7:50am

Do you mean that you only wanna see line that have a first field match on the 3 files ?

---------- Post updated at 01:50 PM ---------- Previous update was at 01:41 PM ----------

If so here is an ugly solution that should do the job.
In my example the file names are file01, file02 and file03.

cat file?? | awk '{print $1}' | sort -u | while read i; do
   RF1="$(grep $i file01)"
   RF2="$(grep $i file02)"
   RF3="$(grep $i file03)"
   if [ -n "$RF1" -a -n "$RF2" -a -n "$RF3" ]; then
      echo "$RF1 $(echo $RF2 | awk '{print $4}') $(echo $RF3 | awk '{print $4}')"
   fi
done

GoldenFire · May 24, 2011, 9:37am

Thank you Chirel.
Yes I want to have the lines that matches on all the files.

The problem is that I have many files (184 files in TXT format). Is this possible to join them without being required to grep all the files one by one?

vgersh99 · May 24, 2011, 9:56am

If you look at the bottom of this thread, you'll find helpful hints from the related threads.

GoldenFire · May 24, 2011, 10:14am

Thank you vgersh99.

I looked into the related threads and I found this code helpful:

awk  '{i=$1;sub(i,x);A=A$0} FILENAME==ARGV[ARGC-1]{print i A}' file*

I am newbie.
Could you please tell me what are ARGV and ARGC in the above code?

vgersh99 · May 24, 2011, 10:19am

from 'man awk':

       ARGC        The number of  command  line  arguments  (does  not
                   include options to gawk, or the program source).

       ARGV        Array of command  line  arguments.   The  array  is
                   indexed  from  0 to ARGC - 1.  Dynamically changing
                   the contents of ARGV can control the files used for
                   data.