How to find the average,min,max ,total count?

Hi ,
Below is my sample data,I have this 8 column(A,B,C,D,E,F,G,H) in csv file.

A     , B      ,C  ,D ,E  ,F,G  ,H
 
4141,127337,24,15,20,69,72.0,-3
4141,128864,24,15,20,65,66.0,-1
4141,910053,24,15,4,4,5.0,-1
4141,910383,24,15,22,3,4.0,-1
4141,496969,24,15,14,6,-24.0,-18
4141,497237,24,15,14,4,-9.0,-5
4141,497287,24,15,14,0,-4.0,-4
 
4141,635643,27,1,7,10,16.0,-6
4141,635645,27,1,7,12,24.0,-12

4149,798509,24,15,14,4,-17.0,-13
4149,127333,24,15,20,100,108.0,-8
4149,127337,24,15,20,95,98.0,-3
4149,128864,24,15,20,76,80.0,-4
4149,910350,24,15,13,7,8.0,-1

I want the output like below in another CSV file

A , C, D,Diff_count,min(H column),max(H column),AVG(H column)
4141,24,25,7,-18,-1,-4.714
4141,27,1,2,-12,-6,-9
4149,24,15,5,-13,-1,-5.8

Can anybody help me to script in korn shell to get the above result!!
Thanks in advance.

Try this:

#! /bin/sh

awk '
BEGIN {
        RS=""
        FS="\n"
        OFS=","
}
{
        max=-9999999
        min=9999999
        sum=0
        for(i=1;i<=NF;++i) {
                split($i,r,",")
                if (r[8]>max) { max=r[8] }
                if (r[8]<min) { min=r[8] }
                sum+=r[8]
        }
        print r[1],r[3],r[4],NF,min,max,sum/NF
}' input_file
#!/bin/ksh
#cat inpfile | cut -d"," -f1,3,4 -s | uniq
INPFIL=./inpfile
print "A , C, D,Diff_count,min(H column),max(H column),AVG(H column)"
for ind in `cat ${INPFIL} | cut -d"," -f1,3,4 -s | uniq`
do
  A=`echo ${ind} | cut -d"," -f1`
  C=`echo ${ind} | cut -d"," -f2`
  D=`echo ${ind} | cut -d"," -f3`
  awk -F"," -v a=${A} -v c=${C} -v d=${D} 'BEGIN { cnt = 0; minh = 99999; maxh = -99999; avgh = 0; sum = 0; } { if($1 == a &&
$3 == c && $4 == d) { cnt++; if ($8 < minh) minh = $8; if ($8 > maxh) maxh = $8; sum += $8; } } END { avgh = sum/cnt; print a
"," c "," d "," cnt "," minh "," maxh "," avgh; }' ${INPFIL}
done
exit 0
1 Like

hi arun,

thanks for your response.Your script printing only

A , C, D,Diff_count,min(H column),max(H column),AVG(H column)
4141
4141
4149

my expected output is like below

A , C, D,Diff_count,min(H column),max(H column),AVG(H column)
4141,24,25,7,-18,-1,-4.714
4141,27,1,2,-12,-6,-9
4149,24,15,5,-13,-1,-5.8

I think your script reaches my expectation but it is printing columnA alone.
Please advice me.

Replace

print a "," c "," d "," cnt "," minh "," maxh "," avgh;

with

printf("%d, %d, %d, %d, %d, %d, %f\n",  a, c, d, cnt, minh, maxh, avgh);

Arun,

Thanks for your prompt response..Now it works thanks a lot.

Your script is good and nice logic too.:slight_smile:

hi arun ,

when iam getting the average from the above code,It gives 5 or 6 decimal places like "-4.714" and some avg values like "-4.16666".I want it like "-4.71" and "4.16".

4141,24,25,7,-18,-1,-4.714
4141,27,1,2,-12,-6,-9
4149,24,15,5,-13,-1,-5.8

Can you please advice me how to achieve this.
Thanks in advance.

Change

printf("%d, %d, %d, %d, %d, %d, %f\n",  a, c, d, cnt, minh, maxh, avgh);

to

printf("%d, %d, %d, %d, %d, %d, %5.2f\n",  a, c, d, cnt, minh, maxh, avgh);
1 Like