processing matrix column wise

Abhishek_Ghose · August 20, 2007, 1:01pm

I have a m X n matrix written out to file, say like this:

1,2,3,4,5,6
2,6,3,10,34,67
1,45,6,7,8,8

I want to calculate the column averages in the MINIMUM amount of code or processing possible. I would have liked to use my favorite tool, "AWK" but since it processes rowwise, getting the average of the first column values wud mean one call, getting the average of the second column values wud mean another call....so on till 'n' calls.

Is there a better way to this? You may also suggest a different way to represent the mXn matrix, as long I can conceptually map a mXn matrix to the suggested file format. Please help.

vgersh99 · August 20, 2007, 1:23pm

nawk -f rowAVG.awk myFile

rowAVG.awk:

BEGIN {
  FS=OFS=","
}
{
  for(i=1; i<=NF; i++)
    arr+=$i
}
END {

  for(i=1; i<=NF; i++)
    printf("%.2f%s", arr/FNR, (i==NF) ? "\n" : OFS)
}

some awk-s are more pesky than the others - not keeping track of the FNR and NF in the 'END' block. For those cases:

BEGIN {
  FS=OFS=","
}
{
  for(i=1; i<=NF; i++)
    arr+=$i
  nf=NF; fnr=FNR
}
END {

  for(i=1; i<=nf; i++)
    printf("%.2f%s", arr/fnr, (i==nf) ? "\n" : OFS)
}

summer_cherry · August 20, 2007, 10:08pm

hope this one can help you:

awk 'BEGIN{
FS=","
}
{
 for (i=1;i<=NF;i++)
 arr=arr+$i
 }
 END{
 for (j in arr)
 {
 temp=arr[j]/NR
 printf("The average of column %s is %s",j,temp)
 }
}' filename

Abhishek_Ghose · August 21, 2007, 12:20am

First of all, thanks for the response.

The order of my matrix is curently 2000 X 1000....but it might increase over time. Is there a possibility of memory overflow? This is one reason why I didnt want to store many values in a code... preferable is if I could process them and output as they come, maintaining the minimum number of variables in the code.

vgersh99 · August 21, 2007, 10:01am

the memory allocation is only limited by the physical/virtual memory configured on your box.
If you're on Solaris and using 'nawk' you would hit the limitation on the number of fields in a record much sooner than you'd notice the 'noticable' mmory aloocation issues.

If you can thing of the algorithm to achieve what you're trying to with out using a hash/array REGARDLESS of the implementation language, pls do share.

You might want post to USENET's comp.lang.awk - they're resourceful bunch!

vgersh99 · August 21, 2007, 12:55pm

I guess one way would be to invert the matrix and process a row (what used to be a 'column') in one shot producing avg.
Now... whether inverting the matrix and rprocessing the inverted matrix would be either quicker and/or less 'memory-consuming'.... that's a different issue.