Try this
awk '
/^GN/ {split($0,a,"[=;]");printf "\n%s|",a[2]}
/^RC/ {split($0,a,"=");printf a[2];f=1}
/^CC/ {if (f) {printf "|"};sub(/-!- FUNCTION:/,x);split($0,a," +");printf "%s ",a[2];f=0}
END {print ""}' uniprot_human_prts_with_tissue_fn
Every field is now separated by |
, in TISSUE its separated by ;
What should we do with the Synonyms field? Print it, where?
3 first line:
YWHAB|Keratinocyte;Thymus;Skin;Colon carcinoma;Platelet;Melanoma;Leukemic T-cell;|Adapter protein implicated in the regulation of a large spectrum of both general and specialized signaling pathways. Binds to a large number of partners, usually by recognition of a phosphoserine or phosphothreonine motif. Binding generally results in the modulation of the activity of the binding partner. Negative regulator of osteogenesis. Blocks the nuclear translocation of the phosphorylated form (by AKT1) of SRPK2 and antagonizes its stimulatory effect on cyclin D1 expression resulting in blockage of neuronal apoptosis elicited by SRPK2.
YWHAE|Liver;Brain;Heart;Caudate nucleus, Heart, and Subthalamic nucleus;Placenta;Platelet;B-cell lymphoma;Histiocytic lymphoma;Brain, and Cajal-Retzius cell;Melanoma;Liver;Cervix carcinoma;|Adapter protein implicated in the regulation of a large spectrum of both general and specialized signaling pathways. Binds to a large number of partners, usually by recognition of a phosphoserine or phosphothreonine motif. Binding generally results in the modulation of the activity of the binding partner.
YWHAH|Brain;Brain;Lymph;Keratinocyte;Platelet;Platelet;Leukemic T-cell;|Adapter protein implicated in the regulation of a large spectrum of both general and specialized signaling pathways. Binds to a large number of partners, usually by recognition of a phosphoserine or phosphothreonine motif. Binding generally results in the modulation of the activity of the binding partner. Negatively regulates the kinase activity of PDPK1.
EDIT: Here synonyms are printed behind original like: YWHAH-YWHA1
awk '
/^GN/ {split($0,a,"[=;]");p=(a[4])?a[2]"-"a[4]:a[2];printf "\n%s|",p}
/^RC/ {split($0,a,"=");printf a[2];f=1}
/^CC/ {if (f) {printf "|"};sub(/-!- FUNCTION:/,x);split($0,a," +");printf "%s ",a[2];f=0}
END {print ""}' uniprot_human_prts_with_tissue_fn
YWHAB|Keratinocyte;Thymus;Skin;Colon carcinoma;Platelet;Melanoma;Leukemic T-cell;|Adapter protein implicated in the regulation of a large spectrum of both general and specialized signaling pathways. Binds to a large number of partners, usually by recognition of a phosphoserine or phosphothreonine motif. Binding generally results in the modulation of the activity of the binding partner. Negative regulator of osteogenesis. Blocks the nuclear translocation of the phosphorylated form (by AKT1) of SRPK2 and antagonizes its stimulatory effect on cyclin D1 expression resulting in blockage of neuronal apoptosis elicited by SRPK2.
YWHAE|Liver;Brain;Heart;Caudate nucleus, Heart, and Subthalamic nucleus;Placenta;Platelet;B-cell lymphoma;Histiocytic lymphoma;Brain, and Cajal-Retzius cell;Melanoma;Liver;Cervix carcinoma;|Adapter protein implicated in the regulation of a large spectrum of both general and specialized signaling pathways. Binds to a large number of partners, usually by recognition of a phosphoserine or phosphothreonine motif. Binding generally results in the modulation of the activity of the binding partner.
YWHAH-YWHA1|Brain;Brain;Lymph;Keratinocyte;Platelet;Platelet;Leukemic T-cell;|Adapter protein implicated in the regulation of a large spectrum of both general and specialized signaling pathways. Binds to a large number of partners, usually by recognition of a phosphoserine or phosphothreonine motif. Binding generally results in the modulation of the activity of the binding partner. Negatively regulates the kinase activity of PDPK1.