Help with allocated text content based on specific rules...

perl_beginner · May 19, 2011, 2:44am

Input file format:

/tag="ABL"
/note="abl homolog
2
/tag="ABLIM1"
/note="actin binding LIM 1
/tag="ABP1"
/note="amiloride binding protein 1 (amine oxidase (copper-
containing))
/tag="ABR"
/note="active BCR-related
/tag="AC003042.1"
/note="SDR family member 11
precursor
.
.
.

Desired output file:

/tag="ABL"
/note="abl homolog 2
/tag="ABLIM1"
/note="actin binding LIM 1
/tag="ABP1"
/note="amiloride binding protein 1 (amine oxidase (copper-containing))
/tag="ABR"
/note="active BCR-related
/tag="AC003042.1"
/note="SDR family member 11 precursor
.
.
.

If the first line of the content are not start as "/tag" or "/note". I would like those content allocated at the end of the content at "/note" based on the following rules:

If the last content at the "/note" is end with "-", the content (first line are not start as "/tag" or "/note") should straight append to it.
eg.

Input:
/note="amiloride binding protein 1 (amine oxidase (copper-
containing))

Desired output:
/note="amiloride binding protein 1 (amine oxidase (copper-containing))

If the last content at the "/note" is excluded with "-", the content (first line are not start as "/tag" or "/note") should add a space " " before append to it.
eg.

Input:
/note="SDR family member 11
precursor

Output:
/note="SDR family member 11 precursor

Any programming language (awk, sed ,perl ,etc) are appreciate.
Thanks first for advice

ctsgnb · May 19, 2011, 3:37am

Lazy way ...

tr '\n' '#' <infile | sed 's/#\([^/]\)/\1/g' | tr '#' '\n'

---------- Post updated at 09:33 AM ---------- Previous update was at 09:29 AM ----------

dealing with space stuff or not when end with '-' :

tr '\n' '#' <tst | sed 's/-#\([^/]\)/-\1/g;s/#\([^/]\)/ \1/g' | tr '#' '\n'

---------- Post updated at 09:34 AM ---------- Previous update was at 09:33 AM ----------

$ cat tst
/tag="ABL"
/note="abl homolog
2
/tag="ABLIM1"
/note="actin binding LIM 1
/tag="ABP1"
/note="amiloride binding protein 1 (amine oxidase (copper-
containing))
/tag="ABR"
/note="active BCR-related
/tag="AC003042.1"
/note="SDR family member 11
precursor

$ tr '\n' '#' <tst | sed 's/-#\([^/]\)/-\1/g;s/#\([^/]\)/ \1/g' | tr '#' '\n'
/tag="ABL"
/note="abl homolog 2
/tag="ABLIM1"
/note="actin binding LIM 1
/tag="ABP1"
/note="amiloride binding protein 1 (amine oxidase (copper-containing))
/tag="ABR"
/note="active BCR-related
/tag="AC003042.1"
/note="SDR family member 11 precursor

$

---------- Post updated at 09:37 AM ---------- Previous update was at 09:34 AM ----------

May be shorten a bit like:

tr '\n' '#' <inputfile | sed 's/-#/-/g;s/#\([^/]\)/ \1/g' | tr '#' '\n'

perl_beginner · May 19, 2011, 3:42am

Hi ctsgnb,

Thanks for your reply.
Your "lazy way" is worked but it don't follow rules 2
It gives the following output:

cat infile:
/note="SDR family member 11
precursor

tr '\n' '#' < infile | sed 's/#\([^/]\)/\1/g' | tr '#' '\n'
/note="SDR family member 11precursor

My desired output is:

/note="SDR family member 11 precursor

Thanks again

ctsgnb · May 19, 2011, 4:49am

I have meanwhile updated my previous post, did you try the last suggestion ?

---------- Post updated at 10:49 AM ---------- Previous update was at 10:40 AM ----------

also try

sed -e ':a' -e 'N;/^\/.*\n\/.*/{P;D;};s/\(.*\)-\n/\1-/;/^\/.*\n[^/].*/s/\n\([^/]\)/ \1/;p;d' -e 'ta' infile

perl_beginner · May 19, 2011, 5:42am

Hi ctsgnb,

Really thanks.
It worked