To print certain patterns in a column

Hi,

From my input files, I want to print $1, $2 and only certain pattern in $4 (EC). I use this code but it print all the words in $4

awk -F"\t" '$4 {print $1,$2,$4}

I just want EC follows by the numbers in $4

The input file as follows:-

Entry     Entry name    Status     Names
Q01284   2NPD_NEUCR     R        Nitronate monooxygenase (EC 1.13.12.16) (2-nitropropane dioxygenase) (2-NPD) (Nitroalkane oxidase)
Q99VF6   2NPD_STAAN     U        Probable nitronate monooxygenase (EC 1.13.12.16) (Nitroalkane oxidase)
Q9F131   3HBH1_PSEAC    R        3-hydroxybenzoate 6-hydroxylase 1 (EC 1.14.13.24) (Constitutive 3-hydroxybenzoate 6-hydroxylase)
Q5EXK1   3HBH_KLEOX     R        3-hydroxybenzoate 6-hydroxylase (EC 1.14.13.24)
P07046   3SHD_NEUCR     R        3-dehydroshikimate dehydratase (DHS dehydratase) (DHSase) (EC 4.2.1.-)

The output should be:-

Entry     Entry name         Names
Q01284   2NPD_NEUCR        EC 1.13.12.16
Q99VF6   2NPD_STAAN        EC 1.13.12.16
Q9F131   3HBH1_PSEAC       EC 1.14.13.24
Q5EXK1   3HBH_KLEOX        EC 1.14.13.24
P07046   3SHD_NEUCR        EC 4.2.1.-

Would appreciate your kind help on this. Thanks

Have a go with this:

awk -F "\t"  '
    NR == 1 { printf( "%s\t%s\t%s\n", $1, $2, $4 ); next; }
    NR > 1 {
        gsub( ".*EC", "EC", $4 );
        gsub( "\\).*", "", $4 );
        printf( "%s\t%s\t%s\n", $1, $2, $4 );
    }
'  input-file >output-file
1 Like

Hi agama,

i tried but it did not change anything. it still print the whole words in $4

What Operating system and version of awk are you using? If you are on Sun/Solaris, try nawk instead of awk.

If you have gawk, please try this:

awk -F"\t" 'match($0, /(EC [0-9\-\.]+)/, p) { print $1,$2,p[1]}'
1 Like

Hi,

i am using ubuntu 10.04.

What is the output of awk --version

Both the programme that I posted, and that leafei posted generate expected results with awk Version 4.0.0.

Are you sure that your columns are tab separated? If it's not, that would cause my solution to fail; leafei's match() works against the whole record and thus wouldn't be subject to that issue if the file is not tab separated.

1 Like

Hi Both,

I am so sorry. i just checked and found out that it is not tab separated. Actually, i generated the input file from other source file. It supposed to be in tab delimited format. But anyway, i will fix my file first and keep u guys updated with the results. Thanks

---------- Post updated at 10:45 PM ---------- Previous update was at 10:41 PM ----------

Hi Both,

Thanks for your kind help on this. It worked after i fix the file with tab separated format. :smiley: