I usually have to count frequency of each character in whole data, which I do with
awk -F "" '{ for ( i=1; i<=NF; i++) freq[$i]++} END {for (a in freq) print a, freq[a]}'
Now I am almost clueless when I need to count frequency of characters at each position, I am trying to present example with subset of data below
GAATCCGGAAACAGCAACTTCAAANCA
GTNATTCGGGCCAAACTGTCGAA
TTNGGCAACTGTTAGAGCTCATGCGACA
CCTGCTAAACGAGTTCGAGTTGAANGA
TTNCGGAAGTGGTCGCTGGCACGG
1st position G = 1
T = 2
C =1
A =1
2nd position
T=3
C=2
so on
Any ideas, help is most appreciated. Please tell me if I am not clearly stating the problem.
Actually, it's not a trick but more a detour born out of sheer despair. While awk (at least the one I use, mawk) does accept if ( (i,j) in freq ) , it would not allow for for ( (i,j) in freq ) That's why I invented/introduced the second array, just to keep hands on the base chars.
You're welcome. There is a single array. This adds 1 to an array element with a single index that consists of the position number and the kind separated by OFS (output field separator) which defaults to a single space. So for example A["3 N"]++ and A["28 A"]++