Input file (4 DATA record shown in this case):
DATA AA0110
ACCESSION AA0110
VERSION AA0110 GI:157412239
FEATURES Location/Qualifiers
length 1..1170
1..1700
/length="1170"
position 1..1170
/length="1170"
band 1..948
/length="948"
//
DATA BC599
DEFINITION USA
ACCESSION BC599
VERSION BC599 GI:239744030
FEATURES Location/Qualifiers
position 1..3159
/length="3159"
length 1..40000
/length="40000"
//
DATA HI101
DEFINITION UK
ACCESSION HI101
VERSION HI101 GI:239745142
FEATURES Location/Qualifiers
band 1..757
/length="757"
length 1..747
/length="747"
//
DATA AVE111
ACCESSION AVE111
VERSION AVE111 GI:157412223
FEATURES Location/Qualifiers
position 1..1170
/length="1170"
//
Desired output file:
157412239 1170
239744030 40000
239745142 747
157412223 -
Condition required:
- The first column info of desired output file is extracted from the line shown "VERSION" and extract the content after GI:;
- The second column info of desired output file is extracted from the line that shown "/length="XXX"" after "length" word;
- If first column info of desired output file is available but lack of column 2 info. Just put a "-" and print in desired output file;
Command try:
awk 'BEGIN {RS=""; FS="//"} /VERSION/ {for (i=1;i<=NF;i++) {if ($i~/\/length=/) {print $i}}}' input_file.txt
DATA AA0110
ACCESSION AA0110
VERSION AA0110 GI:157412239
FEATURES Location/Qualifiers
length 1..1170
/length="1170"
position 1..1170
/length="1170"
band 1..948
/length="948"
The command I try fail to give my desired output result
I was thinking to use "//" as field separator of each record.
Thanks for any advice.
---------- Post updated at 09:01 PM ---------- Previous update was at 04:54 AM ----------
Is there any advice or hints provided to solve my doubt?
I'm still stuck at solving this problem
Thanks in advance!