Min/max/total for selected columns based on first column as ID

For every ID in column one, I want to get the min/max/total for each ID.

The values in different columns are not sorted, the actual attempt, works only if the columns are sorted.

input file

2010  1  44413               41105.0 21.75 146  
2010  1  44415               41105.0 21.75 146  
2010  1  44417               41105.0 21.75 100  
2010  1  44419               41105.0 28.00 146 
2010  1  50000               41105.0 21.75 200  
2010  1  44423               41105.0 21.75 146  
2011  1  44425               41105.0 21.75 146  
2011  1  44427               41105.0 20.00 146  
2011  1  70000               41105.0 21.75 146  
2011  1  44433               41105.0 29.00 700  
2011  1  44435               41105.0 21.00 146  
2012  1  44437               41105.0 21.75 146  
2012  1  20000               41105.0 21.75 150  
2012  1  44441               41105.0 21.75 146  
2012  1  90000               41105.0 21.75 146  
2012  1  44445               41105.0 21.75 600  
2012  1  44447               41105.0 21.75 146  
2012  1  44447               41105.0 21.75 146  

attempt

 awk '{ currKey = $1 }
    currKey != prevKey { prt(); min=$3;min1=$5;min2=$6;cnt=0}
    { prevRec=$0; prevKey=currKey; max=$3;max1=$10;max2=$6; cnt++ }
      { prevKey=currKey; TOTAL+=$6}    
    END { prt() }
    function prt(   f) {
        if ( cnt ) {
        split(prevRec,f)
        print f[1],min,max,min1,max1,min2,max2,TOTAL, cnt
        }
    }' file | column -t
 

output from attempt

2010  44413  44423  21.75  21.75  146  146  884   6
2011  44425  44435  21.75  21.00  146  146  2668  5
2012  44437  44447  21.75  21.75  146  146  4002  7

output desired

2010  44413  50000  21.75  28.00  100  200  884   6
2011  44425  70000  20.00  29.00  146  700  1284  5
2012  20000  90000  21.75  21.75  146  600  1280  7

Appreciate your support

I get the results using datamash. But i will like to get the same with awk

datamash -W -g1 min 3 max 3 min 5 max 5 min 6 max 6 sum 6 count 1 < file
2010    44413    50000    21.75   28      100    200    884    6
2011    44425    70000    20      29      146    700    1284    5
2012    20000    90000    21.75   21.75   146    600    1480    7

Hi
Maybe this will be a little clearer

awk '
!A9[$1]         {A2[$1]=$3;A4[$1]=$5;A6[$1]=$6}
A2[$1]>$3       {A2[$1]=$3}
A3[$1]<$3       {A3[$1]=$3}
A4[$1]>$5       {A4[$1]=$5}
A5[$1]<$5       {A5[$1]=$5}
A6[$1]>$6       {A6[$1]=$6}
A7[$1]<$6       {A7[$1]=$6}
                {A8[$1]+=$6; A9[$1]++;}
END             {for(i in A9) print i,A2,A3,A4,A5,A6,A7,A8,A9}
' OFS='\t' file

Slight optimization

awk '
!A9[$1]++       {A2[$1]=$3;A4[$1]=$5;A6[$1]=$6}
A2[$1]>$3       {A2[$1]=$3}
A3[$1]<$3       {A3[$1]=$3}
A4[$1]>$5       {A4[$1]=$5}
A5[$1]<$5       {A5[$1]=$5}
A6[$1]>$6       {A6[$1]=$6}
A7[$1]<$6       {A7[$1]=$6}
                {A8[$1]+=$6}
END             {for(i in A9) print i,A2,A3,A4,A5,A6,A7,A8,A9}
' OFS='\t' file
1 Like

I understand your idea. You have a list sorted by ID and you process one ID at a time. You assume that the file being processed is too large for the memory allocated for this. I want to note the script I proposed earlier allows you to process an unsorted list. And the limits of RAM are limited only by the system-wide but I would continue only for a sorted list...

awk '
a != $1         {if(NR!=1)print a,a2,a3,a4,a5,a6,a7,a8,a9
                 a=$1; a2=$3; a4=$5; a6=$6; a9=a8=a7=a5=a3=0
                }
a2>$3           {a2=$3}
a3<$3           {a3=$3}
a4>$5           {a4=$5}
a5<$5           {a5=$5}
a6>$6           {a6=$6}
a7<$6           {a7=$6}
                {a8+=$6; a9++}
END             {print a,a2,a3,a4,a5,a6,a7,a8,a9}
' OFS='\t' file
1 Like

Hi nezabudka.

The codes works perfect, thanks a lot for your help. Yes, As you mentioned the list in the file are sorted by ID (column1) and we process one ID at a time