Comparing two files and creating a new file

Hi,

I want to compare two files based on the data in their first column. Both the files are not equal in their number of columns and number of entries or rows.

The entries (only the first column) of the file1 should be compared with the entries (only the first column) of the file2. If the data in the first column matches in both the files, then that data's corresponding row in the file2 should be printed fully into a new file (file3). This should be done as long as the entries in the file1 has been fully compared with the file2

file1

1    0.25    0.56    0.56   0   55 
5    0.99    0.44    0.89   0   89
7    0.77    0.45    0.75   0   100

file2

1    0.25    0.56    0.56   0   6.565    6.555    1.589    7.892   70
2    0.88    0.25    0.77   0   6.458    4.215    1.588    7.222   80  
5    0.99    0.44    0.89   0   7.444    5.444    7.444    9.221   90
7    0.77    0.45    0.75   0   4.225    4.256    7.555    2.222   10
8    0.14    0.44    0.78   0   2.457    4.222    8.777    1.454   20

The required output file (file3) should look like:

1    0.25    0.56    0.56   0   6.565    6.555    1.589    7.892   70
5    0.99    0.44    0.89   0   7.444    5.444    7.444    9.221   90
7    0.77    0.45    0.75   0   4.225    4.256    7.555    2.222   10

Try:

awk 'NR==FNR{A[$1]; next}$1 in A' file1 file2
1 Like

Hi Scrutinizer,

Thanks for your reply. It did worked. But i would like to know what does that " A " signify in that awk statement.(just before [$1] - column 1). Could you help with that usefulness ?

Sure:

awk '
  NR==FNR{         # If the first file is being read (only then are FNR and NR equal)
    A[$1]          # then reate an (associative) array element with index "$1" (the first field) and zero content
    next           # and proceed to the next record (line)
  }
  $1 in A          # ( while reading the second file) if the first field ($1) of a recrod is present as an index in array A then print that record ( to print a record is the default action, so {print $0} can be left out)
' file1 file2      # first read file1 and then file2