awk or grep to search one column and output the other

Manyaka · July 20, 2013, 3:46am

Hello,

it would be great if someone can help me with the following:
I want to search for the rows from fileA in column 1 of fileB and output column 2 of fileB if found in fileC. In the moment I search within the complete file. How can I change the code so only column 1 is searched?

cat fileA | while read row ; do
grep "$row" fileB | awk -F"\t" '{print $2}' > fileC

Scott · July 20, 2013, 3:50am

This seems to have been asked (in one guise or another) a million times.

$ awk 'NR==FNR{A[$0]=1; next} A[$1] {print $2}' fileA fileB > fileC

The horizontal tab (\t) is already a default field separator, so unless you plan to explicitly count fields using it there's no need to specify it with -F.

Manyaka · July 20, 2013, 4:34am

Thanks for the fast reply. I now have:

cat fileA | while read row ; do
$ awk 'NR==FNR{A[$1]=1; next} A[$1] {print $2}' fileA fileB > fileC

The "$" seems to produce an error. What does the "A" stand for? And why "=1"? Shouldnt it be "=row"?

Sorry, I am very new to Shell Programming..

---------- Post updated at 03:34 AM ---------- Previous update was at 03:28 AM ----------

I need the "do" command because each search produces another output-file which is processed after..

Scott · July 20, 2013, 4:38am

$ signifies a command prompt, it's not part of the code.

You don't need cat ... | while.

Just

awk 'NR==FNR{A[$0]=1; next} A[$1] {print $2}' fileA fileB > fileC

A is just a variable, call it what you want:

awk 'NR==FNR{row[$0]=1; next} row[$1] {print $2}' fileA fileB > fileC

=1 is just to record that the value of the record ($0) from the first file (FR==FNR) has been stored. This is used later (A[$1]) to test against the first field of the second file, in which case $2 from that file is displayed.

Manyaka · July 20, 2013, 5:47am

Hi Scott can I send you an eMail? I tried but " o be able to send PMs your post count must be 10 or greater."

---------- Post updated at 04:47 AM ---------- Previous update was at 04:24 AM ----------

Would like to send you the data-file and the script to figure it out whats wrong..

Scott · July 20, 2013, 4:19pm

It would be better if you posted the script, and a representative sample of the data file here, obfuscating any sensitive data before doing so.

Manyaka · July 20, 2013, 4:29pm

Hi Scott, thanks again for your help! I finally figured it out - found a very simple solution: I just had to add an ^ before "$row" to only search within the first column..

cat fileB | while read row ; do grep ^"$row" fileA | awk -F"\t" '{print $2}' > fileC

Scott · July 20, 2013, 4:34pm

If you insist on using this approach, a couple of points.

cat is not necessary:

while ...; do
  ...
done < file

^$row will match anything starting with "$row". Qualify it better. i.e.:

$ cat file
A
AA
AAA
AAAA
$ row=A 
$ grep "^$row" file
A
AA
AAA
AAAA
$ grep "^$row\b" file
A