awk to output the percentage of a field compared to length

The awk below using the sample input would output the following: Basically, it averages the text in $5 that matches if $7 < 30 .

awk '{if(len==0){last=$5;total=$7;len=1;getline}if($5!=last){printf("%s\t%f\n", last, total/len);last=$5;total=$7;len=1}else{total+=$7;len+=1}}END{printf("%s\t%f\n", last, total/len)}' Input.txt > output.txt

Sample Input

chr 1   955542  955763  +   AGRN:exon.1 1   0 
chr1   955542  955763  +   AGRN:exon.1 2   0 
chr 1   955542  955763  +   AGRN:exon.1 3   0 
chr 1   955542  955763  +   AGRN:exon.1 4   1 
chr 1   955542  955763  +   AGRN:exon.1 5   1 
chr 1   955542  955763  +   AGRN:exon.1 6   1 
.... 
.... 
chr 1   955542  955763  +   AGRN:exon.1 218 32 
chr 1   955542  955763  +   AGRN:exon.1 219 32 
chr 1   955542  955763  +   AGRN:exon.1 220 32 
chr 1   955542  955763  +   AGRN:exon.1 221 29 

Output

 AGRN:exon.1 4.5714285 

My question is I can not seem to add the correct syntax that will also output the total # of lines in $6 that represent $5 and the % of 7 < 30 I know my words may not be all that helpful so hopefully the desired output will help. Thank you :).

Desired output

  
ID             Average Reads      % of Baits 
AGRN:exon.1    4.5714285          3.16742     (221 (# of lines in $6 /   the # 0f lines < 30 in $7) 

the boild is only to show the math and does not need rto be included.

All-in-one-line confuses me.
Multi-line and the repeated stuff put in functions:

awk '
function prt(){
  printf("%s\t%f\t%f\n", last, total/len, t6/t7)
}
function resetvars(){
  last=$5; total=$7; len=1; t6=0; t7=0
}
{
  if (len==0) { resetvars(); getline }
  if ($5!=last) { prt(); resetvars() } else { total+=$7; len+=1; t6+=1; if ($7<30) { t7+=1 } }
}
END { prt() }
' input.txt

Now it is easy (at least for me) to add new code.
You might also want to initialize the new variables.

The input data is very large (~720MB)

Using that code I get the below:

+    1.000000    -nan
AGRN:exon.1    0.000000    -nan
+    112.000000    2.333333 

I thought I followed your code and it is much easier to read, but the calculations don't look right. Thank you very much :).

Looking at the fields in your file,

awk '{for (i=1; i<=NF; i++) print i, $i}' file
1 chr
2 1
3 955542
4 955763
5 +
6 AGRN:exon.1
7 1
8 0

1 chr1
2 955542
3 955763
4 +
5 AGRN:exon.1
6 2
7 0

, I'm a bit lost about what you want to average, as $5 is either a + sign, or the text "AGRN:exon.1". Same is valid for $7. And, the condition $7 < 30 is never tested in your code.
Where does the line AGRN:exon.1 4.5714285 come from? I can't seem to see the arithmetics...

You may want to revise your spec to enable others the jump in helping.

@RudiC, certainly it's always chr1 ; some Excel/Outlook/IE has added the occasional spaces.

Still $5 would be a string, no?

Yes, $5 is a string and the format of the data will always be:

1 chr1
2 955542
3 955763
4 +
5 AGRN:exon.1
6 2
7 0 

The awk (maybe not the best) calculates the average for all the $5 that are same and uses the value in $7 only if it is < 30. In the desired output that is the 4.5 #. What I would also like to include is % of $6 that makes up that number. I am not sure the best way and included the math in post 1 to try and help. Did this help any? Thank you :).