In the below awk , I am trying to calculate percent for a given id. It is very close the problem is when the # being used in the calculation is zero. I am not sure how to code this condition into the awk as it happens frequently. The portion in italics was an attempt but that lead to an error. Thank you :).
Note, however, that $1/$2 is always going to be zero when $1 is a string that starts with non-numeric characters (other than a few magic strings like Infinity and NaN ).
But isn't he or she calculating (1-f1[$1]/$3) ? With $1 == ABHD12 , f1["ABHD12"] == 10 (from file1), so the result should be around 1 - 0,015 = 0.985 , shouldn't it?
I think part of my problem is that in the attached files, using the awk below Iam getting the correct counts for most of the ids. However, in cases like RYK I get an output of 250 in $2 , but if I manually look at each of the files I count 259 in $2 .
awk
awk '{A[$3] += $2} END{for (i in A) print i, A}' NA12878_newheader_base_counts_lessthan_30reads_perbase_lessthan_genes.txt (file1) NS12911_newheader_base_counts_lessthan_30reads_perbase_lessthan_genes.txt(file2) > all_genes_bases.txt
Neither of which are referenced by the above above code (even after removing the parenthetical elements from the file list).
And, if we change the script above to:
awk '{A[$3] += $2} END{for (i in A) print i, A}' NA12878_newheader_base_counts_lessthan_30reads_perbase.bed NS12911_newheader_base_counts_lessthan_30reads_perbase.bed > all_genes_bases.txt
the output produced is never going to have anything with an alphabetic string in the 1st output field because neither of these input file contain any alphabetic characters in their third fields.
I apologize for the confusion and will post back in a bit with a better example. Part of the issue that I am having, besides the zero line after most cases, is that some of the initial calculations are incorrect. The awk posted works for most but not all. Again I apologize and will post better examples with an explanation. Thank you :).
---------- Post updated at 08:03 AM ---------- Previous update was at 05:16 AM ----------
I believe I found my error on the miscalculation issue I was having in the above confusing post. I am not sure why in the output there are leading and trailing zero's or how to fix that. As you suspected that is happening but why is a mystery to me :). Thank you :).