sort data

bjorb · September 15, 2005, 1:08am

Hi!
I'm trying to sort a file.dat with the sort command. The data contained by file.dat is similar to the data set below:

100.000
99.000
110.000

55.000
113.000
33.000

25.000
9.000
15.000

It is relatively easy to sort the data in ascending or descending order, but the problem is that the separating empty rows between the blocks are put on top of the sorted file. I wish to keep the empty rows and sort the data blocks separately, like this:

99.000
100.000
110.000

33.000
55.000
113.000

9.000
15.000
25.000

Can anybody please help me?
And also I must mention that I'm quite new at UNIX scripting, so try to explain in plain English :o!

bjorb

Ygor · September 15, 2005, 2:04am

Got gawk?

gawk 'BEGIN{RS=""}{n=split($0,a);asort(a);for(i=0;i<=n;i++) print a }' file.dat

futurelet · September 15, 2005, 2:21am

ruby -00ne'puts split("\n").sort_by{|x|x.to_f};puts' file

bjorb · September 15, 2005, 2:46am

Hehe!
First of all I want to give my appreciation for a quick answer.

Without knowing what the heck that code from Ygor meant I tested it in my script. It seemed to work partially in my case, but as a result of my own lazyness it did not give me the result I need.

The data set is really in this format:

100.000 23.000 150.000
99.000 83.000 369.000
110.000 15.000 123.000

55.000 105.000 69.000
113.000 7.000 78.000
33.000 89.000 63.000

25.000 23.000 23.000
9.000 63.000 81.000
15.000 38.000 23.000

The columns represent x-, y- and z-coordinates.
I wish to sort the columns with x- and y-coordinates in ascending order.
Data is also supposed to be written back to data.dat
I apologize for any inconvenience and humbly ask you to help me again.

It is also in my interest to understand the prospective code which is given.
Where can I get a good tutorial on gawk?

Regards
bjorb

futurelet · September 15, 2005, 5:04am

You mean change

100.000   23.000    150.000
99.000    83.000    369.000
110.000   15.000    123.000

to this?

99.000    15.000    159.000
100.000   23.000    369.000
110.000   83.000    123.000

bjorb · September 15, 2005, 5:33am

After reviewing my needs I have realized that it would be sufficient to sort the data only considering the column with the y-coordinates:

From
100.000 23.000 150.000
99.000 83.000 369.000
110.000 15.000 123.000

to
110.000 15.000 123.000
100.000 23.000 150.000
99.000 83.000 369.000

How can I do this?

Regards

bjorb

futurelet · September 15, 2005, 12:30pm

ruby -00ne'puts split("\n").sort_by{|x|x[/\s\S+/].to_f};puts' file

bjorb · September 15, 2005, 4:03pm

The code worked just fine! Thanks, you are a life-saver!! But then another problem occured. After redirecting the sorted data into a new file "datanew.dat", it was to be plotted using gnuplot. gnuplot is invoked using the same script in which the "sorting" code is implemented. It seems like the sorting process and data redirect finishes after gnuplot is initiated. Which then results in a gnuplot request of a file that is not yet created. I'm sure there is away to do this but my familiarity with ruby is rather slim. So once again I turn to you for help.

Can I somehow stall the initiation of gnuplot until the sorting and redirecting (resulting in datanew.dat) are finished?

the active parts of the script are shown below:

# I want this process to end before continuing to the next (invoking GNUplot)
ruby -00ne'puts split("\n").sort_by{|x|x[/\s\S+/].to_f};puts' data.dat > datanew.dat

#Invoking gnuplot with input file gnu.ini
gnuplot gnu.ini

"gnu.ini" contains all plot properties such as labels, gridsize, ticlevel etc, including the command to plot datanew.dat.

regards

bjorb

futurelet · September 15, 2005, 4:18pm

Ygor or vlad: please help us out here! This seems more like a shell issue than a Ruby one.

bjorb, I'm glad you're using Ruby! You're the first person on this forum who has said that he was able to use my Ruby code.

bjorb · September 15, 2005, 4:28pm

When a problem can be solved so elegantly with your code solution, I would be stupid not to use it . Once again, thanks futurelet! I must admit, I do not understand the code presented but I will make an effort to do so.

futurelet · September 15, 2005, 4:55pm

What is your shell's command to make the script pause for x seconds?
Try inserting that after the Ruby line.

An explanation of the code:
ruby -00ne'puts split("\n").sort_by{|x|x[/\s\S+/].to_f};puts' file

Switches:
  00     Make the record-separator an empty line. (In Awk, RS="".)
  n      Automatically read each record of the file, as Awk does.
  e      Program follows.

Program:
  split("\n)
         Short for $_.split("\n").  $_ is the record just read ($0 in Awk).
         The array created by split is then sorted by sort_by, which
         is a built-in implementation of the "Schwartzian Transform"
         that Perlers drool about.
{ |x| x[/\s\S+/].to_f }
         This is the code-block passed to sort_by. x is the parameter;
         an element of the array.
         x[ /\s\S+/ ]  This means "give me the part of the string x
         that is matched by the regular expression."   \s is whitespace;
         \S is non-whitespace.  This grabs the 2nd field.
         The matching substring is then converted to a floating-point
         number by  .to_f
         So sort_by sorts the array on the 2nd field, with each item
         treated as a float, not a string.
         puts  simply prints with appended linefeed; it cleverly prints
         each item of an array on a separate line.

vgersh99 · September 15, 2005, 5:22pm

it seems starnge why gnuplot starts on the 'incomplete' data produced by ruby.
is ruby invoked synchronously or Asynchronously?