Hello,
I have a file that contains 64,235 columns and over 1000 rows and looks similar to this:
ID dad mom 1 2 3 4 5.... 64232
1234 5678 6789 AA BB CC DD EE....ZZ
1342 5786 6897 BB CC DD EE FF....AA
1423 5867 6978 CC DD EE FF GG....BB
I need to leave the first three columns in tact while separating the letters into two columns (including the header) so that the data will look like this:
ID dad mom 1 2 3 4 5 . . . . 64232
1234 5678 6789 A A B B C C D D E E . . . . Z Z
1342 5786 6897 B B C C D D E E F F . . . . A A
1423 5867 6978 C C D D E E F F G G . . . . B B
Any suggestions?
Thanks in advance!
This is a catch 66 problem however you can start from here:
awk 'NR>1{for(i=3;++i<=NF;){$i=substr($i,1,1) FS substr($i,2)}}1' file
With Perl:
$
$ cat -n f2
1 ID dad mom 1 2 3 4 5 9 10 11 12 99 100 101 102 999 1000 1001 9999 10000 10001 64232
2 1234 5678 6789 AA BB CC DD EE FF GG HH II JJ KK LL MM NN OO PP QQ RR SS TT
3 1342 5786 6897 BB CC DD EE FF AA BB CC DD EE FF GG HH II JJ KK LL MM NN OO
4 1423 5867 6978 CC DD EE FF GG BB CC DD EE FF GG HH II JJ KK LL MM NN OO ZZ
$
$ perl -lne 'if ($.==1){s/(\d )/$1 /g} else {s/([A-Z]){2}/$1 $1/g} print' f2
ID dad mom 1 2 3 4 5 9 10 11 12 99 100 101 102 999 1000 1001 9999 10000 10001 64232
1234 5678 6789 A A B B C C D D E E F F G G H H I I J J K K L L M M N N O O P P Q Q R R S S T T
1342 5786 6897 B B C C D D E E F F A A B B C C D D E E F F G G H H I I J J K K L L M M N N O O
1423 5867 6978 C C D D E E F F G G B B C C D D E E F F G G H H I I J J K K L L M M N N O O Z Z
$
$
tyler_durden
Two spaces off for the header, but it isn't entirely clear what that looks like anyway
sed -r 's/(.)(\1)/\1 \2/g' file
-or-
sed 's/\(.\)\(\1\)/\1 \2/g' infile
while(<DATA>){
my @tmp = split(" ",$_,4);
$tmp[3]=~s/(?<=[A-Z])(?=[A-Z])/ /g;
print join " ",@tmp;
}
__DATA__
1234 5678 6789 AA BB CC DD EE....ZZ
1342 5786 6897 BB CC DD EE FF....AA
1423 5867 6978 CC DD EE FF GG....BB