Hi All,
I am another biologist attempting to parse a large txt file containing several million lines like:
tucosnp 56762 T Y 228 228 60 23 .CcCcc,,..c.c,cc,,.C...
What I need to do is get the frequency of periods (.) plus commas (,) in column 9, and populate this number into another column. Incidentally, column 8 is count of all characters in column 9.
So in the example, the correct number to populate the new column is .57 (10/23=.57). There are 10 periods and commas
I know that I must be able to do this somehow with awk, and have the simplest part of the solution, but I can't figure out how to code the denominator "count of . + , in column 9".
awk '{print $1,$2,$3,$4,$8,length($8)/"count of . + , in column 9"}' in > out
My outfile will look something like:
tucosnp 56762 T Y 23 .57
Thanks for any help!