Shell script/awk to sort text

  1. The problem statement, all variables and given/known data:

I have a file with a fragment of a novel, which I have to clear from punctuation and sort all the words contained one per line and non duplicated, all this going to a file called "palabras".

Here is fragment of the input file:

Don Quijote de la Mancha, Cervantes 

Cap�tulo II

Que trata de la notable pendencia 
[*] que Sancho Panza tuvo con la sobrina y ama de don Quijote, con otros sujetos graciosos

And here is a fragment of how the file palabras should look like:

ama
Cap�tulo
Cervantes
con
Don
...
  1. Relevant commands, code, scripts, algorithms:

  2. The attempts at a solution (include all code and scripts):

Surfing on the web to find information, i have only achieved to clear punctuation and put a word in each line, with the following code:

{gsub("[-.,:;�[\*\]\?]","");}
{RS=" ";}
{print > "palabras";}

calling it from terminal with this: cat novela | awk -f p4

p4 is the name of the file of my code.

and when i call from terminal this command: sort -u palabras>palabras2 it generates the file i want (if i put palabras>palabras it generates a blank file)

the question here is, how can i achieve my goal with in the same awk program? cuz i tried this:

{gsub("[-.,:;�[\*\]\?]","");}
{RS=" ";}
{print > "palabras";}
END {sort -u > palabras2;}

With and without END, with sort -u > palabras2 and with sort -u palabras, however the file generated is the same without sorting and without deleting duplicated words.

I would really appreciate any ideas because I have been stucked on this problem for days. Also if you could suggest ideas, where i can call the awk like I said before ( cat novela | awk -f p4).

Thank you in advance.

  1. Complete Name of School (University), City (State), Country, Name of Professor, and Course Number (Link to Course):

ITESM Campus Monterrey, Monterrey, Mexico
Profesor: Juan Jose Icaza
Course: Laboratorio de Sistemas Operativos

You can use a pipeline ( | ) to feed the awk output into sort, both inside awk and outside awk using the shell. You do not need an END part.

Hi guy,
Your awk version is important, don't know if your awk has a built in function 'asort'. If not, it will be hard for implement without 'sort'

Scrutinizer: Mmm could you provide me an example of a code to sort inside itself? Im new to this language because it is just a lab session of several lab sessions in a whole semester, so it would be very helpful.

And Lucas_0148, I dont really know the version, how could i get it? I just create a file and write inside it, and then execute it from terminal, in a fresh ubuntu installation.

For example. Here sort is called as an external program from within awk

awk '{print | "sort"}' file

Here the output of awk is ran through sort through a pipeline..

awk '{print}' file | sort 
1 Like