Attached are the is original output (zipped file) and a custom file using the awk code below in which the average reads per bait are calculated (average.txt)
awk '{if(len==0){last=$4;total=$6;len=1;getline}if($4!=last){printf("%s\t%f\n", last, total/len);last=$4;total=$6;len=1}else{total+=$6;len+=1}}END{printf("%s\t%f\n", last, total/len)}' output.bam.hist.txt > average.txt
Is it possible to output the length of the bait, average # of reads, and the # calculated ...x coverage (3 of reads * 150/length)? Thank you :).
The input file is attached. If I run the code below:
awk '{if(len==0){last=$4;total=$6;len=1;getline}if($4!=last){printf("%s\t%f\n", last, total/len);last=$4;total=$6;len=1}else{total+=$6;len+=1}}END{printf("%s\t%f\n", last, total/len)}' output.bam.hist.txt > average.txt
a file called average.txt results that combines all the baita that match (chr....) and calculates the average # of reads.
The text is the bait followed by the average # of reads, in the input file each bait is outputted over and over again in different positions and thw script combines the baits that match and calculates the #'s of each.
Is it possible to output the length of the bait, average # of reads, and the # calculated ...x coverage (3 of reads * 150/length)? Thank you :).
Your input sample has non-printable CR characters, that obscures the output.
Here is a more straight awk code that also eliminates the CRs
awk '
function pr() {if (len>0) printf "%s\t%d\t%f\n", last, len, total/len}
{gsub("\r","")} # eliminate CRs
($4!=last) {
pr()
last=$4
total=len=0
}
{total+=$6; len+=1}
END {pr()}
' output.bam.hist.txt > average.txt
---------- Post updated at 04:17 PM ---------- Previous update was at 03:51 PM ----------
You said "3 of reads". Looking at my US keyboard, you probably want "# of reads".
Then append another \t%f (tab character and a floating point field) to the first argument of the printf, and add another argument with the formula:
function pr() {if (len>0) printf "%s\t%d\t%f\t%f\n", last, len, total/len, total/len*150/len}
Your function ar() {function code} is not called.
Either change the function pr() {function code} ,
or change the other code to call the ar() instead of the pr() .