HELP! Group by in shell script (awk/sed?)

Hello,

Could some expert soul please help me with this? I have following file format -

task             time
abc                5
xyz                4
abc                5
xyz                3
ddd                10
ddd                2

I need to generate output as -

task           min              max           avg
abc             5                  5              5
xyz              3                  4             3.5
ddd             2                  10             6

I am new to awk/sed and need help. I wrote followiing but that totals up instead of breaking down in min, max and average.

awk -F"|" '{arr[$1]+=$2} END {for (i in arr) {print i, arr[i]}}' datafile

Please help!!

Big Thanks in advance..

nawk -f snc.awk myFile

snc.awk:

FNR>1 {
  min[$1]=(!($1 in min) || min[$1]> $2 )? $2 : min[$1]
  max[$1]=(max[$1]> $2)? max[$1] : $2
  cnt[$1]++
  sum[$1]+=$2
}
END {
  print "task\tmin\tmax\tavg"
  for (i in cnt)
    printf("%s\t%d\t%d\t%.1f\n", i, min, max, sum/cnt)

}

Thanks a lot. For my own learning, if I add one more column as following

id    task      time
1     abc       11
2     abc       14
3     xyz        10

How would the script variable look like?

Depends on your desired output.
What is the reporting 'key'?

If it's still 'task' (regardless of the id), just change '$1' to '$2'

---------- Post updated at 12:57 PM ---------- Previous update was at 12:56 PM ----------

To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags

```text
 and 
```

by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

---------- Post updated at 01:01 PM ---------- Previous update was at 12:57 PM ----------

If it's still 'task' (regardless of the id):

FNR>1 {
  min[$2]=(!($2 in min) || min[$2]> $3 )? $3 : min[$2]
  max[$2]=(max[$2]> $3)? max[$2] : $3
  cnt[$2]++
  sum[$2]+=$3
}
END {
  print "task\tmin\tmax\tavg"
  for (i in cnt)
    printf("%s\t%d\t%d\t%.1f\n", i, min, max, sum/cnt)

}

I need to generate the output same as earlier described -

Task        Min           Max           Avg

However, the input file is going to have task id as well and separated by a "|" that looks like -

1111     |     abc      |   10
1111     |     xyz      |   7
1112     |     abc      |   5
1112     |     xyz      |   9

Thanks you for your help! :slight_smile:

---------- Post updated at 12:07 PM ---------- Previous update was at 12:03 PM ----------

Did not see updated look at the end of your post..works now. Thanks! :slight_smile: ..awarded you bits!!

BEGIN {
  FS=" *\\| *"
}
FNR>1 {
  min[$2]=(!($2 in min) || min[$2]> $3 )? $3 : min[$2]
  max[$2]=(max[$2]> $3)? max[$2] : $3
  cnt[$2]++
  sum[$2]+=$3
}
END {
  print "task\tmin\tmax\tavg"
  for (i in cnt)
    printf("%s\t%d\t%d\t%.1f\n", i, min, max, sum/cnt)

}