Tranforming unformatted text to 1-column with awk

Hello, I have an input that is unformatted text, such as below.

INPUT:

Hola. Me llamo Dav�d y soy de Andaluc�a. Mi color favorito es rojo. �Como te llamas? �Cu�l es tu color favorito?

I want to take each word and punctuation and place it on its own line, with a space in between each individual phrase, such as the desired output below.

OUTPUT:

Hola
.

Me
llamo
Dav�d
y
soy
de
Andaluc�a
.

Mi
color
favorito
es
rojo
.

�
Como
te
llamas
?

�
Cu�l
es
tu
color
favorito
?

Is there an

awk

one-liner that could achieve this result?
Thanks!

How about

sed 's/ /\n/g;s/\([[:punct:]]\)/\n\1\n/g' file
Hola
.

Me
llamo
Dav�d
y
soy
de
Andaluc�a
.

Mi
color
favorito
es
rojo
.


�
Como
te
llamas
?


�
Cu�l
es
tu
color
favorito
?

Please note that my locale is not spanish, so the handling of the inverse question mark will be different in your locale.

1 Like