Matching two file contents and extracting associated information

Hi,
I am new to shell programming and need help. I have File1 with some ID numbers and File2 with ID number and some associated information.

I want to match the ID numbers from File1 to contents in File2 and output a third file which pulls out the ID numbers and the associated information with the match.

For example

cat File1

 
pc00123
pc345
pc1255

cat File2

>sequence 1a, prod, (pc00123)
GEAVWFKAGSQIFSEGGLDYLGNPSLVHAQSILAIWACQVILMGAVEGYRIAG
GPLGEVVDPLYPGGSFDPLGLADDPEAFAELKVKEIKNGRLAMFEEGTRFSSMFGFFVQAIVTGKGP
>sequence 45e, padam, (pc00123;pc345;pc3213)
GEAVWFKAGSQIFSEGGLDYLGNPSLVHAQSILAIWACQVILMGAVEGYRIAG
GPLGEVVDPLYPGGSFDPLGLADDPEAFAELKVKEIKNGRLAMFSMFGFFVQAIVTGKGPABBBGAAAFF
AKGMLMOIHRGNBGBSSSVFGHDSF
>sequence 332, paadat, (pc555;pc10623)
GEAVWFKAGSQIFSEGGLDYLGNPSLVHAQSILAIWACQVILMGAVEGYRIAG

I want to match the ID numbers from File1 with File2 and output not only the lines that match but also get the associated information which ends before �>sequence�.

The needed output is :

>sequence 1a, prod, (pc00123)
GEAVWFKAGSQIFSEGGLDYLGNPSLVHAQSILAIWACQVILMGAVEGYRIAG
GPLGEVVDPLYPGGSFDPLGLADDPEAFAELKVKEIKNGRLAMFEEGTRFSSMFGFFVQAIVTGKGP
>sequence 45e, padam, (pc00123;pc345;pc3213)
GEAVWFKAGSQIFSEGGLDYLGNPSLVHAQSILAIWACQVILMGAVEGYRIAG
GPLGEVVDPLYPGGSFDPLGLADDPEAFAELKVKEIKNGRLAMFSMFGFFVQAIVTGKGPABBBGAAAFF
AKGMLMOIHRGNBGBSSSVFGHDSF

It will be very helpful if you can suggest how to do this. Thanks

nawk -f new.awk file1.txt file2.txt

new.awk:

BEGIN {
  FS="[();]"
}
FNR==NR {f1[$0];next}
/^>sequence/{
  for(i=2; i<NF;i++)
    if ($i in f1) {f++;print; next}
  f=0
  next
}
f
1 Like

Thanks a lot. It works nicely..