Text columns processing using awk

P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { } I'm trying to build an awk statement to print from a file (file1):

A    1,2,3    *    
 A    4,5,6     **
 B    1    
 B    4,5    *

Another file like this:

 A    1,2,3    *    3    1    0.333
 A    4,5,6     **    3    2    0.666
 B    1        1    0    0
 B    4,5    *    2    1    0.5
 

In this new file, the first three columns are the same as in the original file. The forth column must contain the number of comma separated elements in column 2. The fifth column must contain the number of characters in column 3. The last column contains the proportion of column 5 on column 4.

I'm trying the following code:

awk  -F ',' '{print $1"\t"$2"\t"$3"\t"NF-1"\t"length($3)"\t"(length($3)/ NF-1)}' file1 > file2

But the output is unexpected (it seems that the second column is splitted, and thus all the calculations are wrong).

  
 ~$ cat file2 
 A    1    2    3    *        2    4    0.333333 
 A    4    5    6     **    2    5    0.666667 
 B    1                0    0    -1 
 B    4    5    *        1    0    -1 

Thank you for your help.

EDIT
P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { } I actually fixed one of my errors, but I still have no idea why the fourth column doesn't print correctly. Here's my new code and output:

awk '{print $1"\t"$2"\t"$3"\t"(NF","$2 -1)"\t"length($3)"\t"(length($3)/(NF","$2-1))}' file1 > file2
 
:~$ cat file2 
 A	1,2,3	*	3,0	1	0.333333 
 A	4,5,6	**	3,3	2	0.666667 
 B	1		2,0	0	0 
 B	4,5	*	3,3	1	0.333333 
  

Since in the second example, where the FS is set to default, NF has no relation with the number of comma separated elements in $2 and $2-1 is unlikely to do what you want..

Compare:

$ awk '{print $2, $2-1}' file
1,2,3 0
4,5,6 3
1 0
4,5 3

It is probably best to use split() with a comma as separator on $2 to get the fields that you want..

P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { } P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { }

awk '{split($2,a,",")}' file1; awk '{OFS="\t"; print $1, $2, $3, length(a), length($3), length($3)/length($2)}' file1 > file2

I tried this but I get as output:

:~$ cat file2
A    1,2,3    *    0    1    0.2
A    4,5,6    **    0    2    0.4
B    1        0    0    0
B    4,5    *    0    1    0.333333

Close, but not correct. Column 4 has zeros.

Why are you using two separate awk statements? The first awk has no output and thus no meaning. You should integrate the two..

PRE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }PRE.ctl { font-family: "Lohit Hindi",monospace; }P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { }

awk '{l2=split($2,a,","); OFS="\t"; print $1, $2, $3, l2, length($3), length($3)/l2}'

This works. Thanks for the tips!