Output calculations

Attached are the is original output (zipped file) and a custom file using the awk code below in which the average reads per bait are calculated (average.txt)

  awk '{if(len==0){last=$4;total=$6;len=1;getline}if($4!=last){printf("%s\t%f\n", last, total/len);last=$4;total=$6;len=1}else{total+=$6;len+=1}}END{printf("%s\t%f\n", last, total/len)}' output.bam.hist.txt > average.txt 

Is it possible to output the length of the bait, average # of reads, and the # calculated ...x coverage (3 of reads * 150/length)? Thank you :).

Bait Length Reads Coverage
chr12:112884064-112884217 153 158.20915 155x (158*150)/153

unless you give us a small sample input and output and describe in words whats going on, you are less likely to find help.

The input file is attached. If I run the code below:

 awk '{if(len==0){last=$4;total=$6;len=1;getline}if($4!=last){printf("%s\t%f\n", last, total/len);last=$4;total=$6;len=1}else{total+=$6;len+=1}}END{printf("%s\t%f\n", last, total/len)}' output.bam.hist.txt > average.txt 

a file called average.txt results that combines all the baita that match (chr....) and calculates the average # of reads.

The first three lines of average.txt:

chr12:112884064-112884217 158.2092
chr12:112888106-112888331 220.5333
chr12:112890983-112891206 228.287

The text is the bait followed by the average # of reads, in the input file each bait is outputted over and over again in different positions and thw script combines the baits that match and calculates the #'s of each.

Is it possible to output the length of the bait, average # of reads, and the # calculated ...x coverage (3 of reads * 150/length)? Thank you :).

Bait                                Length Reads     Coverage
chr12:112884064-112884217 153 158.20915 155x (158*150)/153

the formula is not needed, I was just trying to show how the coverage is calculated.

Your input sample has non-printable CR characters, that obscures the output.
Here is a more straight awk code that also eliminates the CRs

awk '
function pr() {if (len>0) printf "%s\t%d\t%f\n", last, len, total/len}
{gsub("\r","")} # eliminate CRs
($4!=last) {
  pr()
  last=$4
  total=len=0
}
{total+=$6; len+=1}
END {pr()}
' output.bam.hist.txt > average.txt

---------- Post updated at 04:17 PM ---------- Previous update was at 03:51 PM ----------

You said "3 of reads". Looking at my US keyboard, you probably want "# of reads".
Then append another \t%f (tab character and a floating point field) to the first argument of the printf, and add another argument with the formula:

function pr() {if (len>0) printf "%s\t%d\t%f\t%f\n", last, len, total/len, total/len*150/len}

Not sure what is wrong. Thank you very much :).

$ awk '
>  function pr() {if (len>0) printf "%s\t%d\t%f\t%f\n", last, len, total/len}
>  function ar() {if (len>0) printf "%s\t%d\t%f\t%f\n", last, len, total/len, total/len*150/len}
>  {gsub("\r","")} # eliminate CRs
>  ($4!=last) {
>    pr()
>    last=$4
>    total=len=0
>  }
>  {total+=$6; len+=1}
>  END {pr()}
>  ' output.bam.hist.txt > average.txt
awk: cmd. line:2: (FILENAME=output.bam.hist.txt FNR=154) fatal: not enough arguments to satisfy format string
        `%s     %d      %f      %f
'
                  ^ ran out for this one
printf "%s\t%d\t%f\t%f\n", last, len, total/len}

1 -> %s <- last
2 -> %d <- len
3 -> %f <- total/len
4 -> %f <- undefined

This command runs: but does not result in the desired output, did I do something wrong? Thank you :).

awk '
 function pr() {if (len>0) printf "%s\t%d\t%f\n", last, len, total/len}
 function ar() {if (len>0) printf "%s\t%d\t%f\t%f\n", last, len, total/len, total/len*150/len}
 {gsub("\r","")} # eliminate CRs
 ($4!=last) {
   pr()
   last=$4
   total=len=0
 }
 {total+=$6; len+=1}
 END {pr()}
 ' output.bam.hist.txt > average2.txt

Desired output
(in line 1 the equation for $4 would be (158*150) / 153 = 155x
the 150 is a static # that will never change

chr12:112884064-112884217 153 158.20915 155x
chr12:112888106-112888331 225 220.533333 147x
chr12:112890983-112891206 223 228.286996 153x

Thank you :).

Your function ar() {function code} is not called.
Either change the function pr() {function code} ,
or change the other code to call the ar() instead of the pr() .