Attached file is parsed so that only the three columns result.
DACH1 occurs 34 times with an average of 0.881541
NEB occurs 159 times with an average of 0.837628
LTBP1 occurs 46 times with an average of 0.748722
parse result: output.txt (the text is removed and the xxx is seperated in a column)
A B C
DACH1 34 0.881541 (xxx occurs xxx times with an average of xxx)
NEB 159 0.837628
LTBP1 46 0.748722
Thanks :).
Using awk:
awk 'BEGIN { print "A\tB\tC"}
/^(DACH1|NEB|LTBP1) occurs/ { print $1,$3,$9,"("$0")"} ' OFS='\t' exonprobescore.txt > output.txt
Edit: Remove red text above is you don't want (xxx occurs xxx times with an average of xxx)
in the 4th column
cmccabe
3
awk 'BEGIN { print "Gene\tCount\tScore"}
/^ occurs/ { print $1,$3,$9} ' OFS='\t' exonprobescore.txt > output.txt
I modified the code a bit as I am trying to get the scores of each row in the file (exonprobescore.txt), but I this doesn't seem correct. Thank you.
perhaps match on number of fields on the line:
awk 'BEGIN { print "Gene\tCount\tScore"}
NF==9 { print $1,$3,$9} ' OFS='\t' exonprobescore.txt > output.txt
Or field 2 being occurs
awk 'BEGIN { print "Gene\tCount\tScore"}
$2=="occurs" { print $1,$3,$9} ' OFS='\t' exonprobescore.txt > output.txt
Perfect.
Is it possible to pipe or string two awk commands together? Where exon.txt is the initial input and score.txt is the list to be parsed? Thank you :).
awk '{ N[$5]++ ; T[$5]+=$6 } END { for(X in N) printf("%s occurs %d times with an average of %f\n", X, N[X], T[X]/N[X]); }' exon.txt > score.txt | awk 'BEGIN { print "Gene\tCount\tScore"}
> $2=="occurs" { print $1,$3,$9} ' OFS='\t' score.txt > output.txt
You could achieve that with this single awk program:
awk '
BEGIN{ print "Gene\tCount\tScore" > "output.txt" }
{N[$5]++; T[$5]+=$6}
END{
for(X in N) {
printf("%s occurs %d times with an average of %f\n", X, N[X], T[X]/N[X])
printf("%s\t%d\t%f\n", X, N[X], T[X]/N[X]) > "output.txt"
}
}' exon.txt > score.txt
1 Like