Row to Column conversion?

sammy777 · August 3, 2013, 10:06am

I have a text file with the geneIds separated by space in each line. The number Ids in lines are different.

The file is like:

 abc qwe tyu ghj jkl dfg sdf
 cvb sdk fgh tyu
 uio iop tyu rty eru wer rty iop
 asd sdf dfg fgh zxc

I want to format the file like:

 abc
 qwe
 tyu
 ghj
 jkl
 dfg
 sdf

 cvb
 sdk
 fgh
 tyu
 
 uio
 iop
 tyu
 .
 . 
 iop
 
 asd 
 sdf
 .
 . 
 .
 zxc

Any help would be appreciated. Thanks!

durden_tyler · August 3, 2013, 10:26am

$ 
$ cat f02
abc qwe tyu ghj jkl dfg sdf
cvb sdk fgh tyu
uio iop tyu rty eru wer rty iop
asd sdf dfg fgh zxc
$ 
$ perl -pne 'BEGIN {$\="\n"}s/ /\n/g' f02
abc
qwe
tyu
ghj
jkl
dfg
sdf

cvb
sdk
fgh
tyu

uio
iop
tyu
rty
eru
wer
rty
iop

asd
sdf
dfg
fgh
zxc

$ 
$

Jotne · August 4, 2013, 4:24am

awk '{$1=$1}1' OFS="\n" ORS="\n\n" infile

This awk works if you have a space in front of some of the line or not
cvb vs cvb
@durden_tylor, you have removed this space in f02
perl gives an extra blank row due to this space

Just_Ice · August 4, 2013, 5:08am

sed works too ...

sed -e "s/$/\n/" -e "s/[ \n]/\n/g" infile

Jotne · August 4, 2013, 5:26am

sed has same problem as perl
it adds double blank line, since there is a space infront of the line

abc
qwe
tyu
ghj
jkl
dfg
sdf


cvb
sdk
fgh
tyu


uio
...
...

alister · August 4, 2013, 1:17pm

The OP did not specify an operating system, so I'll mention that because of \n in the replacement text, that sed script will fail on most non-Linux systems.

Regards,
Alister

---------- Post updated at 01:17 PM ---------- Previous update was at 01:09 PM ----------

If we assume that the sample data and sample output in the OP are exactly correct, then your solution is incorrect. The sample output has a leading space which your solution discards. Additionally, your solution adds a concluding blank line which does not appear in the sample output.

The OP should clarify exactly what they want because there is a discrepancy between the problem description and the sample input/output. The post's text appears to state (due to the grammar, it's impossible to be certain) that input fields are delimited by a single space (supported by the sample data). If that's correct, then, in the absence of special consideration (which isn't mentioned in the post) a leading space represents an empty first field which should become a blank line. However, this would cause an ambiguity in the output format, since a blank line could represent an empty field or a record separator.

If we assume that the leading space on every line of the input and output is an artifact of copy/pasting, a simple, portable solution which treats the input as newline delimited records of single-space delimited fields and converts it to blank-line delimited records (with the exception that the final record is not followed by a blank line) of newline delimited fields:

sed 'y/ /\n/; $!G'

Regards,
Alister

Jotne · August 4, 2013, 1:48pm

I do see now that the OPs output request also has space in front

awk '{$1=" "$1}1' OFS="\n " ORS="\n\n"

This gives space in front of all line.
But we do not know for sure of is this he wants

MadeInGermany · August 4, 2013, 1:51pm

sed '
s/\( *[^ ]\{1,\}\)/\1\
/g
' infile