Strings from one file which exactly match to the 1st column of other file and then print lines.

AshwaniSharma09 · August 19, 2010, 2:39am

Hi,
I have two files. 1st file has 1 column (huge file containing ~19200000 lines) and 2nd file has 2 columns (small file containing ~6000 lines).
#################################
huge_file.txt

a
a
ab
b

##################################
small_file.txt

a       1.5
b       2.5
ab      7.5

###################################
Script I am using :

BEGIN { cnt=0;
  while ( getline line < "small_file.txt" > 0 )
      n[++cnt]=line
}
{
for(i=1; i<=cnt; i++)
   if(match(n,$0) > 0)
      print n
}

####################################
and output I am having :

a       1.5
ab      7.5
a       1.5
ab      7.5
ab      7.5
b       2.5
ab      7.5

####################################
But the desired output is :

a     1.5
a     1.5
ab     7.5
b     2.5

#####################################

I understand that I am getting this output since I am using MATCH function here. But I want exact matches only. can I use some regular expression around $0 or any other better way.

Any help would be highly appreciated.

ranjithpr · August 19, 2010, 2:54am

$ cat huge_file.txt
a
a
ab
b
hh
$ cat small_file.txt
a 1.5
b 2.5
ab 7.5
cd 1.1
$ awk 'NR==FNR{buff[$1]=$0;next} {if(buff[$1]!="") print buff[$1]}' small_file.txt huge_file.txt
a 1.5
a 1.5
ab 7.5
b 2.5
$

rdcwayx · August 19, 2010, 3:03am

awk 'NR==FNR{a[$1]=$2;next}{print $1,a[$1]}' small_file.txt huge_file.txt

AshwaniSharma09 · August 19, 2010, 4:40am

ranjithpr and rdcwayx, thanks to both of you. Its running very fast and fine. I am wondering if you could spare few minutes and explain me the logic. Once again, many many thanks for saving my day.

rdcwayx · August 19, 2010, 8:16pm

awk 'NR==FNR{a[$1]=$2;next}         # save first file "small_file.txt" into array a. column 1 as array index, column 2 as the array value. 
{print $1,a[$1]}' small_file.txt huge_file.txt    # print each line of huge_file.txt, and the related value in array a