Parse

Attached file is parsed so that only the three columns result.

 DACH1 occurs 34 times with an average of 0.881541
NEB occurs 159 times with an average of 0.837628
LTBP1 occurs 46 times with an average of 0.748722 

parse result: output.txt (the text is removed and the xxx is seperated in a column)

 
A           B       C
DACH1   34      0.881541  (xxx occurs xxx times with an average of xxx)
NEB       159     0.837628
LTBP1     46      0.748722 

Thanks :).

Using awk:

awk 'BEGIN { print "A\tB\tC"}
/^(DACH1|NEB|LTBP1) occurs/ { print $1,$3,$9,"("$0")"} ' OFS='\t' exonprobescore.txt > output.txt

Edit: Remove red text above is you don't want (xxx occurs xxx times with an average of xxx) in the 4th column

 awk 'BEGIN { print "Gene\tCount\tScore"}
/^ occurs/ { print $1,$3,$9} ' OFS='\t' exonprobescore.txt > output.txt

I modified the code a bit as I am trying to get the scores of each row in the file (exonprobescore.txt), but I this doesn't seem correct. Thank you.

perhaps match on number of fields on the line:

awk 'BEGIN { print "Gene\tCount\tScore"}
NF==9 { print $1,$3,$9} ' OFS='\t' exonprobescore.txt > output.txt

Or field 2 being occurs

awk 'BEGIN { print "Gene\tCount\tScore"}
$2=="occurs" { print $1,$3,$9} ' OFS='\t' exonprobescore.txt > output.txt

Perfect.

Is it possible to pipe or string two awk commands together? Where exon.txt is the initial input and score.txt is the list to be parsed? Thank you :).

 
awk '{ N[$5]++ ; T[$5]+=$6 } END { for(X in N) printf("%s occurs %d times with an average of %f\n", X, N[X], T[X]/N[X]); }' exon.txt > score.txt | awk 'BEGIN { print "Gene\tCount\tScore"}
> $2=="occurs" { print $1,$3,$9} ' OFS='\t' score.txt > output.txt

You could achieve that with this single awk program:

awk '
  BEGIN{ print "Gene\tCount\tScore" > "output.txt" }
  {N[$5]++; T[$5]+=$6}
  END{
    for(X in N) {
      printf("%s occurs %d times with an average of %f\n", X, N[X], T[X]/N[X])
      printf("%s\t%d\t%f\n", X, N[X], T[X]/N[X]) > "output.txt"
    }
  }' exon.txt > score.txt
1 Like

Thank you :).