Calculating the average of scores

anurupa777 · April 20, 2013, 9:36am

Hi
I have 2 files
file1

aac 23 25
aac 87 90
aac 33 67

file2

23 0.9
24 0.8
25 0.4
........
67 0.55
........

I want to get output as

aac 23 25 0.7  i,e (0.9+0.8+0.4)/3

like this for all in file1

How to do that? Please help

Scrutinizer · April 20, 2013, 10:39am

If there are no missing values in file 2 you could try:

awk 'NR==FNR{A[$1]=$2; next}{t=0; for(i=$2; i<=$3; i++)t+=A; print $0, t/($3-$2+1)}' file2 file1

anurupa777 · April 22, 2013, 1:01am

Unfortunately if i am running this its showing that the process is been killed i tried with the small dataset it gives the desired output.

hanson44 · April 22, 2013, 1:13am

Are you saying it works correctly for a small file, but does not finish (crashes) with a large file?

anurupa777 · April 22, 2013, 1:18am

Ya exactly dividing the file would be more hectic due do my large data set

hanson44 · April 22, 2013, 1:25am

I'm not suggesting you divide the file. I'm asking if it crashes on the large file. If yes, how long does it run before crashing? And is there any error message? Copy and paste what's going on so we can see.

anurupa777 · April 22, 2013, 1:29am

ya it crashes on large files. It runs around 15 to 20 minutes. It just says killed nothing more

hanson44 · April 22, 2013, 1:48am

One thing to suggest is add fflush () as follows: t/($3-$2+1)}; fflush ()' file2 file1 so when it crashes, and you are saving the output, you can see where it crashes, or even see if it gets to file1. There is a good chance the output might provide a clue. If your awk does not support fflush, it will quickly let you know.

anurupa777 · April 22, 2013, 3:11am

since its taking huge time and memory i tried to include it in perl and run in my server the script is as follows

use strict;
use Data::Dumper;
use Carp;
use File::Basename;

my $path = "/home/jpsl/";
my $file1 = "2";
my $file2 = "1";

    open PIPE, "| qsub" or die $!;
    print PIPE <<EOF;
#!/bin/sh
#PBS -N Perl
#PBS -l select=4:ncpus=4
#PBS -k oe

awk 'NR==FNR{A[$1]=$2; next}{t=0; for(i=$2; i<=$3; i++)t+=A; print$0, t/($3-$2+1)}' $file1 $file2


EOF

its returning error like this
awk: NR==FNR{A[]=; next}{t=0; for(i=; i<=; i++)t+=A; printtrail.pl, t/(-+1)}
awk:           ^ syntax error
awk: fatal: invalid subscript expression

hanson44 · April 22, 2013, 3:28am

Running the awk command within perl is not going to speed anything up or use less memory. Is there some reason not to put in shell script? Sorry, I don't know why you are getting the error messages. I don't normally use perl. Are you sure you are invoking the awk external command correctly from within perl? Is your perl script called trail.pl by any chance?

anurupa777 · April 22, 2013, 3:34am

I wanted to run it in server through pbs so i included it in perl. ya my perl script is trail.pl

hanson44 · April 22, 2013, 3:37am

Aha!! Look at your error message. perl is translating print$0 to printtrail.pl result. All those $0 and $1 etc. are being interpreted by perl, not by awk. There is something wrong with the way you are invoking the awk from within perl.