I need help to split any lines that contain ; or ,
input.txt
Ac020 Not a good chemical process
AC030 many has failed, 3 still maintained
AC040 Putative; epithelial cells
AC050 Predicted binding activity
AC060 rodC Putative; upregulated in 48;h biofilm vs planktonic
The output should be:
Output.txt
Ac020 Not a good chemical process
AC030 many has failed
AC030 3 still maintained
AC040 Putative
AC040 epithelial cells
AC050 Predicted binding activity
AC060 rodC Putative
AC060 upregulated in 48
AC060 h biofilm vs planktonic
I did below code but it does not give me the ID in first column for the splited ones
sed -e 's/\(.\), /\1\n\t\t /g' input.txt | sed -e 's/\(.\);/\1\n\t\t/g' > Output.txt
The result that I got is:
Ac020 Not a good chemical process
AC030 many has failed
3 still maintained
AC040 Putative
epithelial cells
AC050 Predicted binding activity
AC060 rodC Putative
upregulated in 48
h biofilm vs planktonic
I don't know how should i do it to show the ID. Can anyone advise/help me on this? thanks
$ cat file
Ac020 Not a good chemical process
AC030 many has failed, 3 still maintained
AC040 Putative; epithelial cells
AC050 Predicted binding activity
AC060 rodC Putative; upregulated in 48;h biofilm vs planktonic
$ awk 'gsub(/[;,]/,RS $1 OFS) + 1' OFS='\t' file
Resulting
Ac020 Not a good chemical process
AC030 many has failed
AC030 3 still maintained
AC040 Putative
AC040 epithelial cells
AC050 Predicted binding activity
AC060 rodC Putative
AC060 upregulated in 48
AC060 h biofilm vs planktonic
Yeah.. tried both and it works great too. A little bit complicated to understand compared to the first one. But I am glad that it gives something for me to think of. Thanks.
awk '{a=$1; gsub(/\,|\;/,"\n" a OFS,$0); print}' filename
Output will be as follows.
Ac020 Not a good chemical process
AC030 many has failed
AC030 3 still maintained
AC040 Putative
AC040 epithelial cells
AC050 Predicted binding activity
AC060 rodC Putative
AC060 upregulated in 48
AC060 h biofilm vs planktonic