Hi.
I like the concise awk code of radoulov.
I sometimes prefer to think in terms of large tasks before I code up a solution in awk, perl, c, etc (if needed for performance). For example, this problem could be considered as one of an alternate collating sequence: that of the first field. It could also be considered as a grouping problem.
Because the input is already in groups of a specific, desired order, the grouping view lets me think what I need to do to each group. Namely I need to sort by the second field. I cannot normally do that to a part of a file. However, if I could identify each section, then I'd be a step in the right direction.
There are no specific commands to do that, but you can find some interesting codes on the net that will. Here's how this can be done using some of these codes. The names of the codes should be suggestive of what they do:
#!/usr/bin/env bash
# @(#) s2 Demonstrate group sort, missing textutils.
# Infrastructure details, environment, commands for forum posts.
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo ; echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
c=$( ps | grep $$ | awk '{print $NF}' )
version >/dev/null 2>&1 && s=$(_eat $0 $1) || s=""
[ "$c" = "$s" ] && p="$s" || p="$c"
version >/dev/null 2>&1 && version "=o" $p blockwise sort
set -o nounset
echo
FILE=${1-data1}
# If "specimen" does not exist, replace with "cat".
specimen $FILE
echo " Preliminary conditions:"
t1=$( diff $FILE expected-output.txt | wc -l )
echo " About $t1 lines differ."
echo
echo " Results:"
split_at_colchange 1 $FILE |
tee t1 |
blockwise "sort -k2,2" |
tee t2 |
remove_blank_lines > tf
if ! cmp expected-output.txt tf
then
sdiff -w78 expected-output.txt tf
else
echo
echo " Pass - generated output and expected-output.txt are identical."
echo
specimen tf
fi
exit 0
producing:
% ./s2
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian GNU/Linux 5.0
GNU bash 3.2.39
blockwise - ( ~/bin/blockwise Sep 29 12:53 )
sort (GNU coreutils) 6.10
Edges: 10 of 23 lines in data1
1 AAAlkalines Energizer
2 AAAlkalines Energizer
3 AAAlkalines Energizer
4 AAAlkalines Sunlight
5 AAAlkalines Sunlight
...
19 RechargableAAA Duracell
20 EmergencyLight AlFaris
21 EmergencyLight AlFaris
22 EmergencyLight Geepas
23 EmergencyLight Geepas
Preliminary conditions:
About 18 lines differ.
Results:
Pass - generated output and expected-output.txt are identical.
Edges: 10 of 23 lines in tf
1 AAAlkalines Energizer
2 AAAlkalines Energizer
3 AAAlkalines Energizer
4 AAAlkalines Energizer
5 AAAlkalines Energizer
...
19 RechargableAAA Energizer
20 EmergencyLight AlFaris
21 EmergencyLight AlFaris
22 EmergencyLight Geepas
23 EmergencyLight Geepas
I canonicalized the data and reference output file so that the separators were TABs.
The steps are:
- Separate the blocks
- For each block, sort on the second field,
- Remove the separator between blocks.
The temporary files from the tee commands can be examined to see the intermediate-step results.
The collection of perl codes can be found at The Missing Textutils
Best wishes ... cheers, drl