Parse

cmccabe · October 22, 2014, 4:34pm

Attached file is parsed so that only the three columns result.

 DACH1 occurs 34 times with an average of 0.881541
NEB occurs 159 times with an average of 0.837628
LTBP1 occurs 46 times with an average of 0.748722

parse result: output.txt (the text is removed and the xxx is seperated in a column)

 
A           B       C
DACH1   34      0.881541  (xxx occurs xxx times with an average of xxx)
NEB       159     0.837628
LTBP1     46      0.748722

Thanks :).

Chubler_XL · October 22, 2014, 4:50pm

Using awk:

awk 'BEGIN { print "A\tB\tC"}
/^(DACH1|NEB|LTBP1) occurs/ { print $1,$3,$9,"("$0")"} ' OFS='\t' exonprobescore.txt > output.txt

Edit: Remove red text above is you don't want (xxx occurs xxx times with an average of xxx) in the 4th column

cmccabe · October 23, 2014, 10:47am

 awk 'BEGIN { print "Gene\tCount\tScore"}
/^ occurs/ { print $1,$3,$9} ' OFS='\t' exonprobescore.txt > output.txt

I modified the code a bit as I am trying to get the scores of each row in the file (exonprobescore.txt), but I this doesn't seem correct. Thank you.

Chubler_XL · October 23, 2014, 2:39pm

perhaps match on number of fields on the line:

awk 'BEGIN { print "Gene\tCount\tScore"}
NF==9 { print $1,$3,$9} ' OFS='\t' exonprobescore.txt > output.txt

Or field 2 being occurs

awk 'BEGIN { print "Gene\tCount\tScore"}
$2=="occurs" { print $1,$3,$9} ' OFS='\t' exonprobescore.txt > output.txt

cmccabe · October 23, 2014, 4:47pm

Perfect.

Is it possible to pipe or string two awk commands together? Where exon.txt is the initial input and score.txt is the list to be parsed? Thank you :).

 
awk '{ N[$5]++ ; T[$5]+=$6 } END { for(X in N) printf("%s occurs %d times with an average of %f\n", X, N[X], T[X]/N[X]); }' exon.txt > score.txt | awk 'BEGIN { print "Gene\tCount\tScore"}
> $2=="occurs" { print $1,$3,$9} ' OFS='\t' score.txt > output.txt

Chubler_XL · October 23, 2014, 5:02pm

You could achieve that with this single awk program:

awk '
  BEGIN{ print "Gene\tCount\tScore" > "output.txt" }
  {N[$5]++; T[$5]+=$6}
  END{
    for(X in N) {
      printf("%s occurs %d times with an average of %f\n", X, N[X], T[X]/N[X])
      printf("%s\t%d\t%f\n", X, N[X], T[X]/N[X]) > "output.txt"
    }
  }' exon.txt > score.txt

cmccabe · October 24, 2014, 10:41am

Thank you :).