Text columns processing using awk

dovah · July 17, 2014, 6:12am

P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { } I'm trying to build an awk statement to print from a file (file1):

A    1,2,3    *    
 A    4,5,6     **
 B    1    
 B    4,5    *

Another file like this:

 A    1,2,3    *    3    1    0.333
 A    4,5,6     **    3    2    0.666
 B    1        1    0    0
 B    4,5    *    2    1    0.5

In this new file, the first three columns are the same as in the original file. The forth column must contain the number of comma separated elements in column 2. The fifth column must contain the number of characters in column 3. The last column contains the proportion of column 5 on column 4.

I'm trying the following code:

awk  -F ',' '{print $1"\t"$2"\t"$3"\t"NF-1"\t"length($3)"\t"(length($3)/ NF-1)}' file1 > file2

But the output is unexpected (it seems that the second column is splitted, and thus all the calculations are wrong).

  
 ~$ cat file2 
 A    1    2    3    *        2    4    0.333333 
 A    4    5    6     **    2    5    0.666667 
 B    1                0    0    -1 
 B    4    5    *        1    0    -1

Thank you for your help.

EDIT
P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { } I actually fixed one of my errors, but I still have no idea why the fourth column doesn't print correctly. Here's my new code and output:

awk '{print $1"\t"$2"\t"$3"\t"(NF","$2 -1)"\t"length($3)"\t"(length($3)/(NF","$2-1))}' file1 > file2

:~$ cat file2 
 A	1,2,3	*	3,0	1	0.333333 
 A	4,5,6	**	3,3	2	0.666667 
 B	1		2,0	0	0 
 B	4,5	*	3,3	1	0.333333

Scrutinizer · July 17, 2014, 8:04am

Since in the second example, where the FS is set to default, NF has no relation with the number of comma separated elements in $2 and $2-1 is unlikely to do what you want..

Compare:

$ awk '{print $2, $2-1}' file
1,2,3 0
4,5,6 3
1 0
4,5 3

It is probably best to use split() with a comma as separator on $2 to get the fields that you want..

dovah · July 17, 2014, 8:26am

P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { } P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { }

awk '{split($2,a,",")}' file1; awk '{OFS="\t"; print $1, $2, $3, length(a), length($3), length($3)/length($2)}' file1 > file2

I tried this but I get as output:

:~$ cat file2
A    1,2,3    *    0    1    0.2
A    4,5,6    **    0    2    0.4
B    1        0    0    0
B    4,5    *    0    1    0.333333

Close, but not correct. Column 4 has zeros.

Scrutinizer · July 17, 2014, 8:30am

Why are you using two separate awk statements? The first awk has no output and thus no meaning. You should integrate the two..

dovah · July 17, 2014, 9:05am

PRE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }PRE.ctl { font-family: "Lohit Hindi",monospace; }P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { }

awk '{l2=split($2,a,","); OFS="\t"; print $1, $2, $3, l2, length($3), length($3)/l2}'

This works. Thanks for the tips!