awk count characters, sum, and divide by another column

peromhc · June 14, 2010, 6:40pm

Hi All,

I am another biologist attempting to parse a large txt file containing several million lines like:

tucosnp 56762 T Y 228 228 60 23 .CcCcc,,..c.c,cc,,.C...

What I need to do is get the frequency of periods (.) plus commas (,) in column 9, and populate this number into another column. Incidentally, column 8 is count of all characters in column 9.

So in the example, the correct number to populate the new column is .57 (10/23=.57). There are 10 periods and commas

I know that I must be able to do this somehow with awk, and have the simplest part of the solution, but I can't figure out how to code the denominator "count of . + , in column 9".

awk '{print $1,$2,$3,$4,$8,length($8)/"count of . + , in column 9"}' in > out

My outfile will look something like:

tucosnp 56762 T Y 23 .57

Thanks for any help!

bartus11 · June 14, 2010, 6:48pm

awk '{n=gsub("[.,]","",$9);print $1,$2,$3,$4,$8,$8/n}' in > out