In the attached test.txt
each one of the below $1
strings can be found and has a value above it that I am trying to include as $2
.
(the --- are the location of the strings and values)
ISP Loading 84% ---- row 3 $1
TotalReads 75,130,408 ---row 2 $2
ReadLength 203 bp ---- row 3 $3[, the mean value is used
KeySignal 80 --- row 2 $2
UsableSequence 61% ---- row 3 $2
Polyclonal 30.0% --- row 10 $3
LowQuality 09.0% --- row 11 $3
TestFragment 88% --- row 20 $3
AlignedBases 99.1% --- row 29 $3
UnalignedBases 0.9% ---- row 30 $3
The first portion of the awk
before the first |
adds R_Index in $1
and sequentially #'s it in $2
as the first row in the desired output.
The second portion of the awk
after the first |
is an attempt at defaulting Pre-Enrichment
to .
in $2
, but I am unsure of home to put that label in $1
Enrichment
is called Live
and has a value of 99.2%
. The third portion of the awk
after the |
was an attempt to extract the value from test.txt
. Since this is the only value that is after the keyword (not above), I think I am close.
The final output is tab-delimited and looks like this:
R_Index 1
ISP Loading 84%
Pre-Enrichment .
Total Reads 75,130,408
Read Length 203 bp
Key Signal 80
UsableSequence 61%
Enrichment 99.2%
Polyclonal 30.0%
Low Quality 09.0%
Test Fragment 88%
Aligned Bases 99.1%
Unaligned Bases 0.9%
I hope this helps and thank you very much :).
I need to update this post as my desired output has changed. I am not in my office and it is too hard from my phone and will do so from there in about 2 hours.. Thank you :).
here is the new edit:
new desired output
R_Index ISP Loading Pre-Enrichment Total Reads Key Signal Usable Sequence Enrichment Polyclonal Low Quality Test Fragment Aligned Bases Unaligned Bases
1 84 . 75130408 203 80 61 99.2 30 9 88 99.1 0.9
Description:
The tab-delimited output has a header row in it in row 1. These are the key words in the txt file where data is extracted or the additional two fields R_Index
and Pre-Enrichment
. The below is the data with each line commented only for clarification, I hope it helps and thank you :).
R_Index 1 -- sequential #
ISP Loading 84% -- % removed
Pre-Enrichment . -- always a dot
Total Reads 75,130,408 -- commas removed
Read Length 203 bp -- bp removed
Key Signal 80 -- just extracted as is
Usable Sequence 61% -- % removed
Enrichment 99.2% -- called live in the txt % removed
Polyclonal 30.0% -- decimal and % removed
Low Quality 09.0% -- leading 0 and % removed
Test Fragment 88% -- % removed
Aligned Bases 99.1% -- decimal and % removed
Unaligned Bases 0.9% -- % removed