print the whole row in awk based on matched pattern

redse171 · August 9, 2012, 12:14pm

Hi,

I need some help on how to print the whole data for unmatched pattern. i have 2 different files that need to be checked and print out the unmatched patterns into a new file. My sample data as follows:-

File1.txt

Id    Num    Activity            Class                  Type 
309   1.1   Vit B6 metabolism    Met of Cofac & Vit    METABOLIC
10559 1.3   Vit B5 metabolism    Met of Sub            METABOLIC

File2.txt

ID            hit                      hit_annot
10559    Q12618|AC_AJA    Acyl-CoA  Ajello cap GN=OLE1 PE=3 SV=1
12509    Q5ZJF4|PR_CH     Perox-6 OS Gal GN=PRDX6 PE=2 SV=3

The output should print the 1st and 3rd column of file2.txt:
File3.txt

12509   Perox-6 OS Gal GN=PRDX6 PE=2 SV=3

when i use this script

nawk 'FNR==NR{f2[$1];next} !($1 in f2){print $1, $3}' File1.txt File2.txt> File3.txt

I managed to print the unmatched patterns and the desired column, but it only print the first word like:

12509 Perox-6

it ignores the rest of it (OS Gal GN=PRDX6 PE=2 SV=3). i need the script to print the whole content of the column as displayed above.

Can somebody here kindly help me on this. Thanks

Corona688 · August 9, 2012, 12:36pm

Are these files tab-separated?

RudiC · August 9, 2012, 12:41pm

Did you try to print $4,...$NF?

redse171 · August 9, 2012, 1:02pm

Hi Corona688,

Yes, it is a tab-separated file. but for the $3, the words are separated by n spaces.

Hi RudiC,

yes, i did..but the problem is, i have thousands of records that has different amount of words in it. If i use $4, it will only print out the 1st and 2nd word which is "Perox-6 OS". and if i use $NF, it will only print 1st and last words which is "perox-6 SV=3"

Corona688 · August 9, 2012, 1:06pm

You can easily tell awk to use only tabs for separators.

awk -F"\t" ...

redse171 · August 9, 2012, 1:17pm

Hi Corona688,

Thanks so much!! It worked.. Feels stupid as it is just so simple :wall: