Input file format:
/tag="ABL"
/note="abl homolog
2
/tag="ABLIM1"
/note="actin binding LIM 1
/tag="ABP1"
/note="amiloride binding protein 1 (amine oxidase (copper-
containing))
/tag="ABR"
/note="active BCR-related
/tag="AC003042.1"
/note="SDR family member 11
precursor
.
.
.
Desired output file:
/tag="ABL"
/note="abl homolog 2
/tag="ABLIM1"
/note="actin binding LIM 1
/tag="ABP1"
/note="amiloride binding protein 1 (amine oxidase (copper-containing))
/tag="ABR"
/note="active BCR-related
/tag="AC003042.1"
/note="SDR family member 11 precursor
.
.
.
If the first line of the content are not start as "/tag" or "/note". I would like those content allocated at the end of the content at "/note" based on the following rules:
- If the last content at the "/note" is end with "-", the content (first line are not start as "/tag" or "/note") should straight append to it.
eg.
Input:
/note="amiloride binding protein 1 (amine oxidase (copper-
containing))
Desired output:
/note="amiloride binding protein 1 (amine oxidase (copper-containing))
- If the last content at the "/note" is excluded with "-", the content (first line are not start as "/tag" or "/note") should add a space " " before append to it.
eg.
Input:
/note="SDR family member 11
precursor
Output:
/note="SDR family member 11 precursor
Any programming language (awk, sed ,perl ,etc) are appreciate.
Thanks first for advice
Lazy way ...
tr '\n' '#' <infile | sed 's/#\([^/]\)/\1/g' | tr '#' '\n'
---------- Post updated at 09:33 AM ---------- Previous update was at 09:29 AM ----------
dealing with space stuff or not when end with '-' :
tr '\n' '#' <tst | sed 's/-#\([^/]\)/-\1/g;s/#\([^/]\)/ \1/g' | tr '#' '\n'
---------- Post updated at 09:34 AM ---------- Previous update was at 09:33 AM ----------
$ cat tst
/tag="ABL"
/note="abl homolog
2
/tag="ABLIM1"
/note="actin binding LIM 1
/tag="ABP1"
/note="amiloride binding protein 1 (amine oxidase (copper-
containing))
/tag="ABR"
/note="active BCR-related
/tag="AC003042.1"
/note="SDR family member 11
precursor
$ tr '\n' '#' <tst | sed 's/-#\([^/]\)/-\1/g;s/#\([^/]\)/ \1/g' | tr '#' '\n'
/tag="ABL"
/note="abl homolog 2
/tag="ABLIM1"
/note="actin binding LIM 1
/tag="ABP1"
/note="amiloride binding protein 1 (amine oxidase (copper-containing))
/tag="ABR"
/note="active BCR-related
/tag="AC003042.1"
/note="SDR family member 11 precursor
$
---------- Post updated at 09:37 AM ---------- Previous update was at 09:34 AM ----------
May be shorten a bit like:
tr '\n' '#' <inputfile | sed 's/-#/-/g;s/#\([^/]\)/ \1/g' | tr '#' '\n'
1 Like
Hi ctsgnb,
Thanks for your reply.
Your "lazy way" is worked but it don't follow rules 2
It gives the following output:
cat infile:
/note="SDR family member 11
precursor
tr '\n' '#' < infile | sed 's/#\([^/]\)/\1/g' | tr '#' '\n'
/note="SDR family member 11precursor
My desired output is:
/note="SDR family member 11 precursor
Thanks again
I have meanwhile updated my previous post, did you try the last suggestion ?
---------- Post updated at 10:49 AM ---------- Previous update was at 10:40 AM ----------
also try
sed -e ':a' -e 'N;/^\/.*\n\/.*/{P;D;};s/\(.*\)-\n/\1-/;/^\/.*\n[^/].*/s/\n\([^/]\)/ \1/;p;d' -e 'ta' infile
1 Like