Separating data from one column into two columns

doobedoo · December 4, 2009, 5:01pm

Hello,
I have a file that contains 64,235 columns and over 1000 rows and looks similar to this:

 
ID   dad  mom  1  2  3  4  5.... 64232
1234 5678 6789 AA BB CC DD EE....ZZ
1342 5786 6897 BB CC DD EE FF....AA
1423 5867 6978 CC DD EE FF GG....BB

I need to leave the first three columns in tact while separating the letters into two columns (including the header) so that the data will look like this:

 
ID   dad  mom  1   2   3   4   5  . . . .  64232 
1234 5678 6789 A A B B C C D D E E . . . . Z Z
1342 5786 6897 B B C C D D E E F F . . . . A A
1423 5867 6978 C C D D E E F F G G . . . . B B

Any suggestions?
Thanks in advance!

danmero · December 4, 2009, 5:39pm

This is a catch 66 problem however you can start from here:

awk 'NR>1{for(i=3;++i<=NF;){$i=substr($i,1,1) FS substr($i,2)}}1' file

durden_tyler · December 4, 2009, 5:46pm

With Perl:

$
$ cat -n f2
     1  ID   dad  mom  1  2  3  4  5  9  10 11 12 99 100 101 102 999 1000 1001 9999 10000 10001 64232
     2  1234 5678 6789 AA BB CC DD EE FF GG HH II JJ KK  LL  MM  NN  OO   PP   QQ   RR    SS    TT
     3  1342 5786 6897 BB CC DD EE FF AA BB CC DD EE FF  GG  HH  II  JJ   KK   LL   MM    NN    OO
     4  1423 5867 6978 CC DD EE FF GG BB CC DD EE FF GG  HH  II  JJ  KK   LL   MM   NN    OO    ZZ
$
$ perl -lne 'if ($.==1){s/(\d )/$1 /g} else {s/([A-Z]){2}/$1 $1/g} print' f2
ID   dad  mom  1   2   3   4   5   9   10  11  12  99  100  101  102  999  1000  1001  9999  10000  10001  64232
1234 5678 6789 A A B B C C D D E E F F G G H H I I J J K K  L L  M M  N N  O O   P P   Q Q   R R    S S    T T
1342 5786 6897 B B C C D D E E F F A A B B C C D D E E F F  G G  H H  I I  J J   K K   L L   M M    N N    O O
1423 5867 6978 C C D D E E F F G G B B C C D D E E F F G G  H H  I I  J J  K K   L L   M M   N N    O O    Z Z
$
$

tyler_durden

Scrutinizer · December 4, 2009, 6:13pm

Two spaces off for the header, but it isn't entirely clear what that looks like anyway

sed -r 's/(.)(\1)/\1 \2/g' file

-or-

sed 's/\(.\)\(\1\)/\1 \2/g' infile

summer_cherry · December 6, 2009, 11:12pm

while(<DATA>){
	my @tmp = split(" ",$_,4);
	$tmp[3]=~s/(?<=[A-Z])(?=[A-Z])/ /g;
	print join " ",@tmp;
}
__DATA__
1234 5678 6789 AA BB CC DD EE....ZZ
1342 5786 6897 BB CC DD EE FF....AA
1423 5867 6978 CC DD EE FF GG....BB