Find avg using awk

Steve_09 · December 19, 2009, 6:22am

Hi, i need some help plz...
The file data.txt contains: code of student,surname and name,code of lesson,grade of lesson.The number of lessons of each student is not the same.

25,Jackson Steve,12,4,34,2,65,2
29,Jordan Mary,13,6,23,8,56,4,34,2
04,Leven Kate,14,6,15,6,26,4
34,Owen Chris,85,6,39,4,42,6,65,8,12,6

I want to find the avg for lessons that the grade is >=5.
This command:

cat data.txt | awk -F, '{for (i=4;i<=NF;i+=2)if($i>=5) printf $i" ";printf"\n"}'

cats grades of students >=5.

How to find avg ? I want to cat this result:

Jordan Mary avg 7
Leven Kate avg 6
Owen Chris avg 6.6

gh0std0g74 · December 19, 2009, 7:07am

how does Jordan Mary's average of 7 come about?? show the calculation.

jp2542a · December 19, 2009, 7:16am

contents of avg.awk

BEGIN {
        FS=","
}

{
        s = 0
        j = 0
        for (i=4; i <= NF; i +=2)
        if ($i >=5)
        {
                s += $i
                j++
        }
        if ( j == 0)
                next
        a = s / j
        if ( a >= 5 )
                print $2 " avg " a
}

results of execution:

$ awk -f avg.awk data.txt
Jordan Mary avg 7
Leven Kate avg 6
Owen Chris avg 6.5

Steve_09 · December 19, 2009, 7:26am

Thanks a lot for the answer !

I want only the grades >=5 so, for Jordan Mary is: 6+8/2 =7

gh0std0g74 · December 19, 2009, 7:39am

if you can use Python, here's an alternative

for line in open("file"):
    line=line.rstrip()
    sl=line.split(",")
    t=[]
    num=[int(i) for i in sl[3::2]]
    for n in num:
        if n>=5:
            t.append(n)
    try:
        print "Avg: %s %.2f" % ( sl[1],sum(t)//len(t))
    except : pass

output

$ ./python.py
Avg: Jordan Mary 7.00
Avg: Leven Kate 6.00
Avg: Owen Chris 6.00

Steve_09 · December 19, 2009, 7:53am

gh0std0g74:

if you can use Python, here's an alternative

for line in open("file"):
   line=line.rstrip()
   sl=line.split(",")
   t=[]
   num=[int(i) for i in sl[3::2]]
   for n in num:
   if n>=5:
   t.append(n)
   try:
   print "Avg: %s %.2f" % ( sl[1],sum(t)//len(t))
   except : pass

output

$ ./python.py
Avg: Jordan Mary 7.00
Avg: Leven Kate 6.00
Avg: Owen Chris 6.00

Thanks, i don't use Python but i hop someday

Scrutinizer · December 19, 2009, 8:24pm

Another approach:

tr ',' '\n' < infile |
awk 'NR%2        {next}
     /^[A-Za-z]/ {n=$0;next}
     !/^[0-4]$/  {A[n]+=$1;B[n]++}
     END         {for (i in A) print i" avg "A/B}'

Output:

Jordan Mary avg 7
Leven Kate avg 6
Owen Chris avg 6.5

Or if your awk supports multi-character RS:

awk 'BEGIN       {RS="[,\n]"}
     NR%2        {next}
     /^[A-Za-z]/ {n=$0;next}
     !/^[0-4]$/  {A[n]+=$1;B[n]++}
     END         {for (i in A) print i" avg "A/B}' infile

jp2542a · December 19, 2009, 9:01pm

You made my brain hurt, Scrutinizer :). What happens if the record has a 5? (The rule said >=5 :D) ... Nice code

Scrutinizer · December 19, 2009, 9:15pm

Sorry about that jp , thanks, I corrected the samples..

Steve_09 · December 20, 2009, 5:28am

I found a better way, look ! :o

cat data.txt | awk -F, '{for (i=4;i<=NF;i+=2)if($i>=5){deg[$2]+=$i;count[$2]++}}END{for(i in deg){print i " has avg "deg[i]/count[i]}}'

Jordan Mary has avg 7
Leven Kate has avg 6
Owen Chris has avg 6.5

Scrutinizer · December 20, 2009, 7:29am

I agree, that is of course more straight forward and more importantly, more legible. Mine is obviously more artficial and more "spielerei"....
You can leave out the cat BTW:

awk -F, '{for (i=4;i<=NF;i+=2) if($i>=5){deg[$2]+=$i;count[$2]++}}
          END{for(i in deg){print i " has avg "deg/count}}' infile