awk question in relation to finding "top 3" by group.

timberley · August 1, 2019, 5:48am

Hi,
I have a file that contains 3 key element fields with the fourth being a total value, something along the lines of this "Site,Region,Town,Total" . What I need to be able to do is find out the top 3 totals by Site using an AWK program. In my efforts so far I have managed to create either a max value by site (using max) or a top three irrespective of site (using a for loop). I realise this needs to be done with arrays but I am having difficulty getting that correct for this particular problem. Is there anybody who could help shed some light on to this please? I should add that I need to be able to see all the key element fields in the output.

Many thanks

rdrtx1 · August 1, 2019, 10:47am

# input file is csv?

# just awk?, try:

awk -F, '
{
   sites[$1]=$1;
   for (i=3; i>=1; i--) {
      if ($4 >= max3[i,$1]) {
         for (j=1; j<i; j++) {
            max3[j,$1]=max3[j+1,$1]; top3[j,$1]=top3[j+1,$1];
         }
         max3[i,$1]=$4; top3[i,$1]=$0;
         break;
      }
   }
}
END {
   for (i in sites) {
      for (j=3; j>=1; j--) print top3[j,sites];
   }
}
' input_file

# using other tools

sort -t, -nrk1,4 input_file | awk -F, 'site[$1]++ < 3'

timberley · August 1, 2019, 10:52am

Thank you so much for that AWK program that does exactly what I needed. My first efforts resulted in top three but not specifically for each of the sites and my second attempt only produced a result for each site but only a single maximum value. I realised I needed to somehow marry the two together but I couldn't get my head around the syntax for the arrays - I really appreciate your help - many thanks