print average of values

Is it possible to print the average of 2 nd column based on a key in 1st col

input

a1X 4
a1X 6
a2_1 10
a2_1 20
a2_1 30
a2_1 30
a2_1 10

output

a1X  5
a2_1  20

Try this:

 
awk '
    {
        sum[$1] += $2;
        count[$1]++;
    }

    END {
        for( x in sum )
            printf( "%s %.2f\n", x, sum[x]/count[x] );
    }
' input-file

Thanx agama. Is it possible to print the max value ?

a1X  6
a2_1  30

Anything is possible, well almost :slight_smile:

awk '
    {
        sum[$1] += $2;
        count[$1]++;
        if( max[$1] < $2 )
            max[$1] = $2;
    }

    END {
        for( x in sum )
            printf( "%s ave=%.2f  max=%d\n", x, sum[x]/count[x], max[x] );
    }
'

Prints both average and max.

---------- Post updated at 22:27 ---------- Previous update was at 22:24 ----------

Replace the printf() with this if you only want max:

printf( "%s %d\n", x,  max[x] );

I am getting 0's as a output ? I forgot to tell you I may have negative numbers. Sorry.

Ex:

G1    -1.093384748
G1    -0.737460373
TB1    1.130494838
TB1    1.180494838

Which is your OS?

--ahamed

macosX

Can you paste the inputfile and the exact output you are getting?

--ahamed

---------- Post updated at 11:03 PM ---------- Previous update was at 11:01 PM ----------

Try this...

awk ' {
        val=$2+0
        sum[$1] += val;
        count[$1]++;
        !max[$1]?max[$1]=val:NULL
        if( max[$1] < val )
            max[$1] = val
    }
    END {
        for( x in sum )
            printf( "%s ave=%.2f  max=%.2f\n", x, sum[x]/count[x], max[x] );
    }
' input_file

--ahamed

Yes it is working great. Thanx!!!! One more thing is it possible to modify the script to select highest value if it is positive and lowest if it is negative ?

highest and lowest for each group?

--ahamed

input

a  1
a  2
b -1
b -2

output

a  2
b  -2

Try this...

awk ' {
        val=$2+0
        sum[$1] += val;
        count[$1]++;
        !max[$1]?max[$1]=val:NULL
         val>0?(max[$1]<val?max[$1]=val:NULL):(max[$1]>val?max[$1]=val:NULL)
    }
    END {
        for( x in sum )
            printf( "%s ave=%.2f  max_min=%.2f\n", x, sum[x]/count[x], max[x] );
    }
' input_file

--ahamed