Transpose an entire text file

heavenfish · September 29, 2009, 11:03pm

Hello all,

I want to transpose the rows of a file to the columns (every characters include spaces), i.e.:

input:

abcdefg
123 456

output:

a1
b2
c3
  d
e4
f5
g6

I wrote a script:

#!/bin/csh -f

set num=`cat $1 |wc -l`
echo "$num rows of $1 transposing to columns..."

if ( -f temp1 ) then
  rm temp1
  touch temp1
else
  touch temp1
endif

set i=1
set j=2

while ($i <= $num)
# read a row,  transpose to a column
  sed -n "$i p" $1 | sed 's/\n*/\n/g' > temp
# merge columns in files
  paste temp$i temp > temp$j
  rm temp$i
  @ i++
  @ j++
end

mv temp$i $1.transposed
rm temp

echo "Done!"

It works, but the paste command has several problems:
It inserts an empty column when appends a new column;
The columns length need be the same;
And the biggest problem is since the script keeps read and write temp files, it is very slow, the files I want to transpose have tens of thousands of rows, so it takes nearly an hour to proceed large files.

Can anyone provide a better idea to me? Thanks.

danmero · September 30, 2009, 12:01am

Try awk

awk '{for(x=0;++x<=NF;)a[x","NR]=$x}END{for(y=0;++y<=NF;){for(z=0;++z<=NR;) printf a[y","z];print ""}}' FS= file

heavenfish · September 30, 2009, 3:27am

I'm not familiar with awk, where to specify the input and output files using the script?

ahmad.diab · September 30, 2009, 6:57am

Dear guys;

   This solution provided by danmero is not working on solaries were my system can't take FS="" as null value??? any exeplination please!!!

I am working on the below station type:-
SunOS server2 5.10 Generic_118833-36 sun4u sparc SUNW,Netra-210

BR

danmero · September 30, 2009, 7:25am

Use /usr/xpg4/bin/awk on Solaris.

ahmad.diab · September 30, 2009, 7:55am

I have tried every thing (/usr/xpg4/bin/awk  & nawk) but with no output!!!!

radoulov · September 30, 2009, 8:30am

You need to change the code in order to make it work with older AWK implementations
(added max to handle variable record length).

nawk 'END {
  for (i=1; i<=max; i++)
    for (j=1; j<=NR; j++)
      printf "%s", _[j,i] (j == NR ? RS : FS)
  }
{
  for (i=0; i<=length; i++) _[NR,i] = substr($0,i,1)
  max = max < length ? length : max
  }' infile

P.S. Calling length without arguments is deprecated, so it should be length($0).

danmero · September 30, 2009, 8:33am

What about this one?

awk '{for(x=0;++x<=NF;)a[x]=a[x]?a[x]$x:$x}END{for(y=0;++y<=NF;)print a[y]}' FS= file

ahmad.diab · September 30, 2009, 8:41am

this code is not working for me but radoulov code is working find because he is using the substr command ....
but my question is how I can set the FS variable to null FS="" using solaries 10??

BR

radoulov · September 30, 2009, 8:45am

The problem is that the null string as FS separator is treated differently in older AWK implementations.

From Effective GAWK Programming:

I mean:

zsh-4.3.10[t]% print abc | gawk '{while (++i<=NF) print "$"i, "is:", $i}' FS=
$1 is: a
$2 is: b
$3 is: c
zsh-4.3.10[t]% print abc | gawk --posix '{while (++i<=NF) print "$"i, "is:", $i}' FS=
$1 is: abc

---------- Post updated at 02:45 PM ---------- Previous update was at 02:43 PM ----------

You already did it, but it has a different meaning for your AWK implementation.
And of course, you can install gawk on Solaris :).

danmero · September 30, 2009, 8:52am

awk '{z=length($0);split($0,a,"");for(x=0;++x<=z;)b[x]=b[x]?b[x]a[x]:a[x]}END{for(y=0;++y<=z;)print b[y]}' file

radoulov · September 30, 2009, 8:54am

For older awk implementations split on null string is not special either
As far as I know only substr will work.

danmero · September 30, 2009, 9:03am

Can't confirm but should work on Solaris base on awk man page Solaris
Let's see if will work for OP.

ahmad.diab · September 30, 2009, 9:12am

you are right radoulv I already test split function using null as a separator but without any use...
BR

varontron · September 30, 2009, 11:05pm

here's a perl version, with and without comments:

#!/usr/bin/perl -w
# arrayref for columns
my $cols = [];
# counter for characters
my $counter = 0;
# get file handle
open FILE, "<in.txt";
# iterate over lines in file
foreach my $line (<FILE>)
{
	# remove line feed
	chomp($line);
	# create array containing each char on line
 	my @chars = split(//,$line);
 	# create arrayref for characters
 	$cols->[$counter]=[];
 	# iterate over characters, pushing each into an
 	# index of the arrayref just created
 	for my $char (@chars)
 	{
 		push(@{$cols->[$counter]},$char);
 	}
 	# iterate the counter for the next line
 	$counter++;
}
# close the file handle
close FILE;

# open a new file handle for output
open OUT, ">output.txt";
# iterate over the columns (0 thru max index of array storing characters at index 0)
for my $i ( 0 .. @{$cols->[0]} -1)
{
	# iterate over the lines (0 thru max index of array storing lines)
	for my $j ( 0 .. @$cols -1 )
	{
		# print character at line $j, column $i
		print OUT $cols->[$j]->[$i];
	}	
	# print linefeed
	print OUT "\n";
}
# close the filehandle
close OUT;

#!/usr/bin/perl -w

my $cols = [];
my $counter = 0;
open FILE, "<in.txt";
foreach my $line (<FILE>)
{
	chomp($line);
 	my @chars = split(//,$line);
 	$cols->[$counter]=[];
 	for my $char (@chars)
 	{
 		push(@{$cols->[$counter]},$char);
 	}
 	$counter++;
}
close FILE;

open OUT, ">output.txt";
for my $i ( 0 .. @{$cols->[0]} -1)
{
	for my $j ( 0 .. @$cols -1 )
	{
		print OUT $cols->[$j]->[$i];
	}	
	print OUT "\n";
}
close OUT;

heavenfish · October 2, 2009, 11:54am

Have studied the awk, improve the script to handle files with different line length, thanks all for the help.

awk '{if(NF > MAX) MAX = NF;for(x=0;++x<=NF;)a[x,NR]=$x}END{for(y=0;++y<=MAX;){for(z=0;++z<=NR;) {if ((y,z) in a) printf a[y,z]; else printf " "};print ""}}' FS= file