Converting text file in a matrix

Hi All,

I do have a file with many lines (rows) and it is space delimited. For example: I have a file named SR345_pl.txt. If I open it in an editor, it looks like this:

adfr A2 0.9345
dtgr/2 A2 0.876
fgh/3 A2 023.76
fghe/4 A2 2345
bnhy/1 A3 3456
bhy A3 0.9876
phy A5 0.987
kdrt A5 0.985
kdfg A7 0.345
klp A9 0.4567

The output I want is in a single line with the part of the name of the file as the first column and then count the number of A2's, A3's, A5's and all. The characters or values in column 2 of the files varies from A1 to A10. Basically counts of number of A1's, A2's.....upto A10.

this is what I want as output

file name      A1  A2 A3 A4 A5 A6 A7 A8 A9 A10
SR345_pl.txt  0   3    2   0   2   0   1  0  1    0

Let me know the best way to do it using awk. A tab delimited output file will be good

try:

awk '
{a[$2]=$2; c[$2]++}
END {
  printf "file name\t";
  for (i=0; i<=11; i++) printf i==11?"\n":"A" i "\t";
  printf FILENAME"\t";
  for (i=0; i<=11; i++) printf i==11?"\n":(c["A" i]?c["A" i]:0) "\t";
}' SR345_pl.txt

I am getting a syntax error in line 5

awk: syntax error at source line 5
 context is
	  for (i=0; i<=11; i++) printf >>>  i== <<< 
awk: illegal statement at source line 5
awk: illegal statement at source line 5

Use nawk on solaris.

don't have access to solaris

Not what I mean.

Some systems have a limited version of awk by default, and must use nawk to have certain features. Since the program didn't work I guessed you were on solaris.

What's your system?

mac and a linux

Made some changes, try this:-

awk ' { a[$2]=$2; c[$2]++ } END {
  printf "file name\t";
  for (i=0; i<=11; i++) { if(i==11) printf "\n"; else printf "A" i "\t"; }
  printf FILENAME"\t";
  for (i=0; i<=11; i++) { if(i==11) printf "\n"; else printf (c["A" i]?c["A" i]:0) "\t"; }
}' SR345_pl.txt

That worked. Thanks bipinajith

Try this, based on rdrtx1's proposal:

$ awk   '{ar[$2]++}
         END {  printf " file name\t";
                for (i=1; i<=10; i++) printf "A"i"\t"; printf "\n"
                printf "%10s\t", FILENAME;
                for (i=1; i<=10; i++) printf "%2d\t", ar["A"i]+0; printf "\n"
             }
        ' file
 file name    A1    A2    A3    A4    A5    A6    A7    A8    A9    A10    
      file     0     4     2     0     2     0     1     0     1     0