Hi All,
I need some help to effectively parse out a subset of results from a big results file.
Below is an example of the text file. Each block that I need to parse starts with "Output of GENE for sequence file 100.fasta" (next block starts with another number). I have given the portion of the block that I need to parse out below and rest of the block is incomplete (given only those text thats needed for parsing.)
# Output of GENE for sequence file 100.fasta
#
#
#
#
#
#
# Maximum BLAST-like scores:
# Inner Max Sim S.D.s above S.D. of
# frags Score P-value sim. mean sims
# SCORE 4.145 0.6043 -0.01 0.0274
# OuterSeq
# frags 0.125 1.0000 0.00 0.0000
#
#
#
#Output of GENE for sequence file 101.fasta
#
#
#
#
#
## Maximum BLAST-like scores:
# Inner Max Sim S.D.s above S.D. of
# frags Score P-value sim. mean sims
# SCORE 2.665 0.8360 0.44 0.0439
# OuterSeq
# frags Not found 0.0000 0.00 0.0000
#
#
#
#
#Output of GENE for sequence file 103.fasta
#
#
#
#
#
## Maximum BLAST-like scores:
# Inner Max Sim S.D.s above S.D. of
# frags Score P-value sim. mean sims
# SCORE 3.665 0.8705 1.44 0.0039
# OuterSeq
# frags Not found 1.0000 2.00 0.0000
I would like to parse out the number, for example, 100 from the block 'Output of GENE for sequence file 100.fasta" and then the Sim P-values of each block in such a way
100 0.6043
101 0.8360
103 0.8705
Please let me know the best and simple way to parse out this using awk or sed.
LA