I am trying to modify the awk below to include the gene name ($5) for each target and can not seem to do so. Also, I'm not sure the calculation is right (average of all targets that are the same is $4 using the values in $7)? Thank you :).
awk '{if((NR>1)&&($4!=last)){printf("%s\t%f\t%s\n", last, total/len,pg);total=$7;len=1;}else{total+=$7;len+=1};pg=$5;last=$4;}END{printf("%s\t%f\t%s\n", last, total/len,pg)}'
output.bam.hist.txt
chr1 40539722 40539865 chr1:40539722-40539865 PPT1 1 159
chr1 40539722 40539865 chr1:40539722-40539865 PPT1 2 161
chr1 40539722 40539865 chr1:40539722-40539865 PPT1 3 161
epilepsy70_average.txt
chr1:40539722-40539865 72.000000
chr1:40542503-40542595 46.500000
chr1:40544221-40544340 60.000000
Desired epilepsy70_average.txt
chr1:40539722-40539865 72.000000 PPT1
chr1:40542503-40542595 46.500000 PPT1
chr1:40544221-40544340 60.000000 PPT1
EDIT: I have modified the awk
to calculate average using $7 and include $5 in the output below.
epilepsy70_average.txt
chr1:40539722-40539865 227.776224 PPT1
chr1:40542503-40542595 109.706522 PPT1
chr1:40544221-40544340 61.596639 PPT1
I can not figure out if the calculated value in $2 is less than or equal to 100 then that line the font changed to red, then the entire file is sort in ascending order by $2 Is this possible?
I think the below will print and maybe thats a start:
awk '{if((NR>1)&&($4!=last)){printf("%s\t%f\t%s\n", last, total/len,pg);total=$7;len=1;}else{total+=$7;len+=1};pg=$5;last=$4;}END{printf("%s\t%f\t%s\n", last, total/len,pg)}' | awk '{if($2>=1000.00)print;}' epilepsy70_average.txt > sort.txt