Hello,
I have a file which has the following format: I have to do is sort individual records in the file based on the 4th field. Each record starts with "Module". Is there an easy way to do this using awk. I have tried piping output from awk to sort and also using "sort" inside awk but what happens is that the sorting happens for the entire file and the record-wise structure is compromised. Any suggestion is welcome.
Module :rtlc_BusSelWrap_rtl_copy_121_2_1_12 (rtlc_BusSelWrap_rtl_copy_121_2_1_12)
M_RTL_MULT_UNS_12_2 1 0 NA
M_RTL_RSHIFT_2_14 1 0 NA
Module :rtlc_AIMux_rtl_copy_189_4_144_18 (rtlc_AIMux_rtl_copy_189_4_144_18)
M_RTL_EQ_32 8 8 0
RTL_AND 28 18 55.5556
M_RTL_DEC_4_2 2 2 0
RTL_NOT 13 3 333.333
M_RTL_PRIM_MUX 152 1044 -85.4406
M_RTL_EQ_4 2 2 0
M_RTL_NEQ_4 2 2 0
M_RTL_RSHIFT_4_32 3 0 NA
The expected output is as follows:
Module :rtlc_BusSelWrap_rtl_copy_121_2_1_12 (rtlc_BusSelWrap_rtl_copy_121_2_1_12)
M_RTL_MULT_UNS_12_2 1 0 NA
M_RTL_RSHIFT_2_14 1 0 NA
Module :rtlc_AIMux_rtl_copy_189_4_144_18 (rtlc_AIMux_rtl_copy_189_4_144_18)
M_RTL_PRIM_MUX 152 1044 -85.4406
M_RTL_EQ_32 8 8 0
M_RTL_DEC_4_2 2 2 0
M_RTL_EQ_4 2 2 0
M_RTL_NEQ_4 2 2 0
RTL_NOT 13 3 333.333
RTL_AND 28 18 55.5556
M_RTL_RSHIFT_4_32 3 0 NA
Which awk version (which operating system)?
Hello,
latest awk/gawk will serve the purpose and OS is Fedora Linux
binlib
September 22, 2011, 10:18am
4
awk '{
if (/^Module :/) {
close("sort -k4,4n")
print
} else print |"sort -k4,4n";
}
END { close("sort -k4,4n") } # not necessary
' infile
This will put the "NA"s together with the 0s, not the last.
1 Like
The code requires the latest awk (awk 4, you can get it here ):
awk 'END {
print k; for (R in r) print r[R]
}
/^Module/ {
if (k) {
print k; delete r[x]
for (R in r) print r[R]
}
k = $0; delete r; next
}
{
r[$NF ~ /NA/ ? 99999999 : $NF, NR] = $0
}
BEGIN {
PROCINFO["sorted_in"] = "@ind_num_asc"
}' infile
For example:
zsh-4.3.12[t]% cat infile
Module :rtlc_BusSelWrap_rtl_copy_121_2_1_12 (rtlc_BusSelWrap_rtl_copy_121_2_1_12)
M_RTL_MULT_UNS_12_2 1 0 NA
M_RTL_RSHIFT_2_14 1 0 NA
Module :rtlc_AIMux_rtl_copy_189_4_144_18 (rtlc_AIMux_rtl_copy_189_4_144_18)
M_RTL_EQ_32 8 8 0
RTL_AND 28 18 55.5556
M_RTL_DEC_4_2 2 2 0
RTL_NOT 13 3 333.333
M_RTL_PRIM_MUX 152 1044 -85.4406
M_RTL_EQ_4 2 2 0
M_RTL_NEQ_4 2 2 0
M_RTL_RSHIFT_4_32 3 0 NA
zsh-4.3.12[t]% awk 'END {
print k
for (R in r) print r[R]
}
/^Module/ {
if (k) {
print k; delete r[x]
for (R in r) print r[R]
}
k = $0; delete r; next
}
{
r[$NF ~ /NA/ ? 99999999 : $NF, NR] = $0
}
BEGIN {
PROCINFO["sorted_in"] = "@ind_num_asc"
}' infile
Module :rtlc_BusSelWrap_rtl_copy_121_2_1_12 (rtlc_BusSelWrap_rtl_copy_121_2_1_12)
M_RTL_MULT_UNS_12_2 1 0 NA
M_RTL_RSHIFT_2_14 1 0 NA
Module :rtlc_AIMux_rtl_copy_189_4_144_18 (rtlc_AIMux_rtl_copy_189_4_144_18)
M_RTL_PRIM_MUX 152 1044 -85.4406
M_RTL_EQ_4 2 2 0
M_RTL_NEQ_4 2 2 0
M_RTL_EQ_32 8 8 0
M_RTL_DEC_4_2 2 2 0
RTL_AND 28 18 55.5556
RTL_NOT 13 3 333.333
M_RTL_RSHIFT_4_32 3 0 NA
1 Like
If NA and 0 are mixed then there is no problem in interpretation. Thanks again.
birei
September 22, 2011, 10:34am
7
Hi,
Using 'Perl':
$ cat infile
Module :rtlc_BusSelWrap_rtl_copy_121_2_1_12 (rtlc_BusSelWrap_rtl_copy_121_2_1_12)
M_RTL_MULT_UNS_12_2 1 0 NA
M_RTL_RSHIFT_2_14 1 0 NA
Module :rtlc_AIMux_rtl_copy_189_4_144_18 (rtlc_AIMux_rtl_copy_189_4_144_18)
M_RTL_EQ_32 8 8 0
RTL_AND 28 18 55.5556
M_RTL_DEC_4_2 2 2 0
RTL_NOT 13 3 333.333
M_RTL_PRIM_MUX 152 1044 -85.4406
M_RTL_EQ_4 2 2 0
M_RTL_NEQ_4 2 2 0
M_RTL_RSHIFT_4_32 3 0 NA
$ cat script.pl
use warnings;
use strict;
@ARGV == 1 or die qq[Usage: perl $0 input-file\n];
my (@nas, @nums);
while ( <> ) {
chomp;
if ( ( my $begin = /\A(?i:module)\s*:/ ) ... ( my $end = /\A(?i:module)\s*:/ ) ) {
if ( $begin ) {
printf "%s\n", $_;
next;
}
if ( ! $end ) {
my @f = split;
if ( uc $f[ $#f ] eq qq[NA] ) {
push @nas, $_;
}
else {
push @nums, $_;
}
next;
}
@nums = sort { (split( /\s+/, $a ))[4] <=> (split( /\s+/, $b ))[4] } @nums;
printf qq[%s\n],
join qq[\n], @nums, @nas;
@nas = ();
redo;
}
} continue {
if ( eof() ) {
@nums = sort { (split( /\s+/, $a ))[4] <=> (split( /\s+/, $b ))[4] } @nums;
printf qq[%s\n],
join qq[\n], @nums, @nas;
@nas = ();
}
}
$ perl script.pl infile
Module :rtlc_BusSelWrap_rtl_copy_121_2_1_12 (rtlc_BusSelWrap_rtl_copy_121_2_1_12)
M_RTL_MULT_UNS_12_2 1 0 NA
M_RTL_RSHIFT_2_14 1 0 NA
Module :rtlc_AIMux_rtl_copy_189_4_144_18 (rtlc_AIMux_rtl_copy_189_4_144_18)
M_RTL_PRIM_MUX 152 1044 -85.4406
M_RTL_EQ_32 8 8 0
M_RTL_DEC_4_2 2 2 0
M_RTL_EQ_4 2 2 0
M_RTL_NEQ_4 2 2 0
RTL_AND 28 18 55.5556
RTL_NOT 13 3 333.333
M_RTL_RSHIFT_4_32 3 0 NA
Regards,
Birei
Another Perl solution:
perl -lane'
if (/^Module/) {
$k and print join $/, $k, map $_->[1], sort {
$a->[0] <=> $b->[0]
} @a;
$k = $_; @a = (); next
}
push @a, [$F[3] =~ /NA/ ? 99999 : $F[3], $_];
END {
print join $/, $k, map $_->[1], sort {
$a->[0] <=> $b->[0]
} @a
}' infile
binlib:
awk '{
if (/^Module :/) {
close("sort -k4,4n")
print
} else print |"sort -k4,4n";
}
END { close("sort -k4,4n") } # not necessary
' infile
This will put the "NA"s together with the 0s, not the last.
Thank you very much. You have given a very elegant solution.