BASH: Sort four lines based on first line

SilversleevesX · September 17, 2011, 11:16pm

I am in the process of sorting an AutoHotkey script's contents so as to make it easier for me to find and view its nearly 200 buzzwords (when I forget which one corresponds with what phrase, which I do now and then).

About half to two-thirds of the script's key phrases correspond to locations (see example data). These are the ones I've been trying to sort first.

My data looks like this:

;Canyon
::grand::
Send, Canyon
Exit

;Elevator
::lift::
Send, Elevator
Exit

;Office
::9to5::
Send, Office
Exit

;Cabin
::log::
Send, Cabin
Exit

;Desert
::sahara::
Send, Desert
Exit

;Front door
::welcome::
Send, Front door

In order for this data to work in the AHK script once it's sorted, it has to remain multi-line. I admit I'm somewhat bummed because the sort command works line-by-line and is not appropriate for what I want to do.

So what would be a good method for sorting just these groups of 4 lines (for example), by the first line in each one, as they stand?

BZT

agama · September 18, 2011, 12:04am

If you are maintaining the file, and thus have control over it's format, I'd suggest keeping your "source" as one record per 4 line entry like this:

;Canyon|::grand::|Send,|Canyon|Exit|
;Elevator|::lift::|Send,|Elevator|Exit|
;Office|::9to5::|Send,|Office|Exit|
;Cabin|::log::|Send,|Cabin|Exit|
;Desert|::sahara::|Send,|Desert|Exit|
;Front door|::welcome::|Send,|Front door|Exit|

Then you could use a simple sort and awk to generate the real file:

sort source_file|awk -v RS="|" '1' >ahk.file

This assumes that there are no vertical bars in any of the four lines. There weren't any in your example, but that might not hold across all of your data.

Output from the above data:

;Cabin
::log::
Send,
Cabin
Exit

;Canyon
::grand::
Send,
Canyon
Exit

;Desert
::sahara::
Send,
Desert
Exit

;Elevator
::lift::
Send,
Elevator
Exit

;Front door
::welcome::
Send,
Front door
Exit

;Office
::9to5::
Send,
Office
Exit

---------- Post updated 09-18-11 at 00:04 ---------- Previous update was 09-17-11 at 23:54 ----------

And if you don't have control over the script file, you could do something like this:

awk 'NF < 1 {next;} { x=x $0 "|"; } /Exit/ { print x; x="" }' script_file | sort | awk -v RS="|" '1' >new_ahk.file

which will join the lines into one, sort and then split them back out again.

alister · September 18, 2011, 12:38am

The following alternative is a bit simpler (though more cryptic for AWK newbies), a bit more tolerant of changing data format, and should also be portable across AWK implementations:

awk '{$NF=$NF OFS}1' FS=\\n RS= OFS=\|

Regards,
Alister

yazu · September 18, 2011, 4:00am

perl -0777 -F'/\n\n+/' -ane '                                                
print join "\n\n",
  map  { $_->[1] }
  sort { $a->[0] cmp  $b->[0] }
  map  { [(split /\n/)[0], $_] } @F;                      
print "\n"' INPUTFILE

SilversleevesX · September 18, 2011, 11:38am

agama:

If you are maintaining the file, and thus have control over it's format, I'd suggest keeping your "source" as one record per 4 line entry like this:
;Canyon|::grand::|Send,|Canyon|Exit|
;Elevator|::lift::|Send,|Elevator|Exit|
;Office|::9to5::|Send,|Office|Exit|
;Cabin|::log::|Send,|Cabin|Exit|
;Desert|::sahara::|Send,|Desert|Exit|
;Front door|::welcome::|Send,|Front door|Exit|
Then you could use a simple sort and awk to generate the real file:
sort source_file|awk -v RS="|" '1' >ahk.file
This assumes that there are no vertical bars in any of the four lines. There weren't any in your example, but that might not hold across all of your data.

Output from the above data:
;Cabin
::log::
Send,
Cabin
Exit

;Canyon
::grand::
Send,
Canyon
Exit

;Desert
::sahara::
Send,
Desert
Exit

;Elevator
::lift::
Send,
Elevator
Exit

;Front door
::welcome::
Send,
Front door
Exit

;Office
::9to5::
Send,
Office
Exit

This looks good, except I'm pretty sure that in AHK, the "Send," and whatever comes after it must be on the same line to work -- it's a command syntax. So that may present an obstacle when reformatting the source (which I do have control over, btw).

I suppose a tr that turns the newlines into pipe symbols, like you have above, may work, but any tr I may run would also get rid of the spaces in between these "quatrains" (which is OK for ahk but would make a quick perusal of the resulting sorted list rather a pain) and leave me with a block of data it would take a lot more time to reformat.

I thought of customizing the IFS to just "/n" (newline), but I think that here, too, I'd lose those empty lines in between. So if I were to use this approach, maybe also including the ";" in the temporary IFS might do it?

It occurs to me I could also preserve those newlines by introducing an echo command somewhere, but where to put it?

Thanks for your help so far.

BZT

---------- Post updated at 11:38 ---------- Previous update was at 09:42 ----------

agama -- How did you get the sample data I posted to look like it does in the first CODE block of your reply. I've been trying for over an hour* to do the same with some of my own data , and it's still escaping me. I presume you used awk...?
BZT

*and about 20 minutes making this "quick" [ahem] reply...

durden_tyler · September 18, 2011, 11:48am

$
$ cat datafile
;Canyon
::grand::
Send, Canyon
Exit

;Elevator
::lift::
Send, Elevator
Exit

;Office
::9to5::
Send, Office
Exit

;Cabin
::log::
Send, Cabin
Exit

;Desert
::sahara::
Send, Desert
Exit

;Front door
::welcome::
Send, Front door
$
$
$ perl -ne '$k = $_ if /^;.*/; $x{$k} .= $_;
            END { $x{$k} .= "$_\n";
                  foreach $key (sort keys %x) {print $x{$key}}
            }' datafile
;Cabin
::log::
Send, Cabin
Exit

;Canyon
::grand::
Send, Canyon
Exit

;Desert
::sahara::
Send, Desert
Exit

;Elevator
::lift::
Send, Elevator
Exit

;Front door
::welcome::
Send, Front door

;Office
::9to5::
Send, Office
Exit

$
$

tyler_durden

agama · September 18, 2011, 11:59am

I completely missed that the send, was being placed on its own line in the output!! Initially I was using spaces to separate the fields and realised that there were embedded spaces and when I converted to pipes I added an unnecessary vertical bar. The 'source' file should be something like:

;Canyon|::grand::|Send, Canyon|Exit|
;Elevator|::lift::|Send, Elevator|Exit|
;Office|::9to5::|Send, Office|Exit|
;Cabin|::log::|Send, Cabin|Exit|
;Desert|::sahara::|Send, Desert|Exit|
;Front door|::welcome::|Send, Front door|Exit|

Where the vertical bar after Send, is removed.

I generated the first set of output using the above input and this command:

sort input-source | awk -v RS="|" '1'

To run with your current file, this should work:

awk 'NF < 1 {next;} { x=x $0 "|"; } /Exit/ { print x; x="" }' input-file | sort  | awk -v RS="|" '1'

I just recut/pasted your sample data (to be sure I hadn't buggered something up along the way) and ran it through the above pipeline; it generated:

;Cabin
::log::
Send, Cabin
Exit

;Canyon
::grand::
Send, Canyon
Exit

;Desert
::sahara::
Send, Desert
Exit

;Elevator
::lift::
Send, Elevator
Exit

;Front door
::welcome::
Send, Front door
Exit

;Office
::9to5::
Send, Office
Exit

I did add a final Exit -- I hope that there is one, otherwise things might not work quite right.

Hope this helps.

SilversleevesX · September 18, 2011, 1:37pm

durden_tyler and agama, both.

As a tip-o-the-hat, a script that uses both awk and perl for my purposes:

#!/bin/bash
input="temp10"
if [ -f "$input" ]; then echo "Found your file."; fi
echo "Sorting with AWK"
awk 'NF < 1 {next;} { x=x $0 "|"; } /Exit/ { print x; x="" }' $input | sort  | awk -v RS="|" '1' >>$input-sorted-awk.txt
echo "Sorting with PERL"
perl -ne '$k = $_ if /^;.*/; $x{$k} .= $_; END { $x{$k} .= "$_\n"; foreach $key (sort keys %x) {print $x{$key}}}' $input>>$input-sorted-perl.txt
echo "Done"

I ran it and it worked.

BZT