Split text separated by ; in a column into multiple columns

Hi,

I need help to split a long text in a column which is separated by ; and i need to print them out in multiple columns. My input file is tab-delimited and has 11 columns as below:-

aRg02004	21452	asdfwf	21452	21452	4.6e-29	5e-29	-1	3	50	ffg|GGD|9009 14101.10 High class -node. ; ffg|GGD|969 101.10 no class -inode. ; ffg|GGD|1149 14101.10 High class RR-node.. ; ffg|GGD|9225 414101.10 class -node.. ; ffg|GGD|11457 2001.10 High class -node. ; ffg|GGD|778 4514.40 loss class -node.. ; 
aRg530	996552	ssawd	996552	996552	4e-21	 1e-9	-1	2	50	ffg|GGD|900 22101.10 High class -node. ;ffg|GGD|840 2241.7 iR class -node. 

what i need is to split the last column ($11) to split into multiple columns based on ";" that it has and the output file should display columns $1 and $11 , which should become like this:-

aRg02004      ffg|GGD|9009 14101.10 High class -node.
aRg02004      ffg|GGD|969 101.10 no class -inode
aRg02004      ffg|GGD|1149 14101.10 High class RR-node..
aRg02004      ffg|GGD|9225 414101.10 class -node.. 
aRg02004      ffg|GGD|11457 2001.10 High class -node.
aRg02004      ffg|GGD|778 4514.40 loss class -node..

aRg530         ffg|GGD|900 22101.10 High class -node. 
aRg530         ffg|GGD|840 2241.7 iR class -node. 

i did write an awk code to do this but it only print out the first part of the text in that column only for each entry id ($1). My code as below:-

awk -F ';' '{print $1 "\t" $11 }' input.txt  > output.txt

Appreciate your kind help on this. Thanks.

Try:

awk -F"\t" '{n=split($11,a,";");for (i=1;i<=n;i++) print $1"\t"a}' input.txt > output.txt
1 Like

Hello,

a sed version:

sed 's/\([^\t]*\t\)\([^\t]*\t\)\{9\}/\1/;s/; *$/\n/;:bcl;s/^\(\([^\t]*\)\([^;]*\)\); */\1\n\2\t/;tbcl' input.txt

Regards.

1 Like

Hi Both,

Thank you so much! Both codes work great. But, If you dont mind, can u guys please explain about the codes? Thanks. :wink: