Sorting within a record using AWK

fifteate · September 22, 2011, 8:44am

Hello,
I have a file which has the following format: I have to do is sort individual records in the file based on the 4th field. Each record starts with "Module". Is there an easy way to do this using awk. I have tried piping output from awk to sort and also using "sort" inside awk but what happens is that the sorting happens for the entire file and the record-wise structure is compromised. Any suggestion is welcome.

Module :rtlc_BusSelWrap_rtl_copy_121_2_1_12 (rtlc_BusSelWrap_rtl_copy_121_2_1_12)
 M_RTL_MULT_UNS_12_2          1          0         NA
   M_RTL_RSHIFT_2_14          1          0         NA
Module :rtlc_AIMux_rtl_copy_189_4_144_18 (rtlc_AIMux_rtl_copy_189_4_144_18)
         M_RTL_EQ_32          8          8          0
             RTL_AND         28         18    55.5556
       M_RTL_DEC_4_2          2          2          0
             RTL_NOT         13          3    333.333
      M_RTL_PRIM_MUX        152       1044   -85.4406
          M_RTL_EQ_4          2          2          0
         M_RTL_NEQ_4          2          2          0
   M_RTL_RSHIFT_4_32          3          0         NA

The expected output is as follows:

Module :rtlc_BusSelWrap_rtl_copy_121_2_1_12 (rtlc_BusSelWrap_rtl_copy_121_2_1_12)
 M_RTL_MULT_UNS_12_2          1          0         NA
   M_RTL_RSHIFT_2_14          1          0         NA
Module :rtlc_AIMux_rtl_copy_189_4_144_18 (rtlc_AIMux_rtl_copy_189_4_144_18)
      M_RTL_PRIM_MUX        152       1044   -85.4406
         M_RTL_EQ_32          8          8          0
       M_RTL_DEC_4_2          2          2          0
          M_RTL_EQ_4          2          2          0
         M_RTL_NEQ_4          2          2          0
             RTL_NOT         13          3    333.333
              RTL_AND         28         18    55.5556
    M_RTL_RSHIFT_4_32          3          0         NA

radoulov · September 22, 2011, 9:37am

Which awk version (which operating system)?

fifteate · September 22, 2011, 10:07am

Hello,
latest awk/gawk will serve the purpose and OS is Fedora Linux

binlib · September 22, 2011, 10:18am

awk '{
  if (/^Module :/) {
    close("sort -k4,4n")
    print
  } else print |"sort -k4,4n";
}
END { close("sort -k4,4n") } # not necessary
' infile

This will put the "NA"s together with the 0s, not the last.

radoulov · September 22, 2011, 10:27am

The code requires the latest awk (awk 4, you can get it here):

awk 'END {
   print k; for (R in r) print r[R]
  }
/^Module/ {
  if (k) {
  print k; delete r[x]
  for (R in r) print r[R]
  }
  k = $0; delete r; next    
  }
{ 
  r[$NF ~ /NA/ ? 99999999 : $NF, NR] = $0
  }
BEGIN {  
  PROCINFO["sorted_in"] = "@ind_num_asc"
  }' infile

For example:

zsh-4.3.12[t]% cat infile 
Module :rtlc_BusSelWrap_rtl_copy_121_2_1_12 (rtlc_BusSelWrap_rtl_copy_121_2_1_12)
 M_RTL_MULT_UNS_12_2          1          0         NA
   M_RTL_RSHIFT_2_14          1          0         NA
Module :rtlc_AIMux_rtl_copy_189_4_144_18 (rtlc_AIMux_rtl_copy_189_4_144_18)
         M_RTL_EQ_32          8          8          0
             RTL_AND         28         18    55.5556
       M_RTL_DEC_4_2          2          2          0
             RTL_NOT         13          3    333.333
      M_RTL_PRIM_MUX        152       1044   -85.4406
          M_RTL_EQ_4          2          2          0
         M_RTL_NEQ_4          2          2          0
   M_RTL_RSHIFT_4_32          3          0         NA

zsh-4.3.12[t]% awk 'END {
   print k
for (R in r) print r[R]
  }
/^Module/ {
  if (k) {
  print k; delete r[x]
  for (R in r) print r[R]
  }
  k = $0; delete r; next
  }
{
  r[$NF ~ /NA/ ? 99999999 : $NF, NR] = $0
  }
BEGIN {
  PROCINFO["sorted_in"] = "@ind_num_asc"
  }' infile 
Module :rtlc_BusSelWrap_rtl_copy_121_2_1_12 (rtlc_BusSelWrap_rtl_copy_121_2_1_12)
 M_RTL_MULT_UNS_12_2          1          0         NA
   M_RTL_RSHIFT_2_14          1          0         NA
Module :rtlc_AIMux_rtl_copy_189_4_144_18 (rtlc_AIMux_rtl_copy_189_4_144_18)
      M_RTL_PRIM_MUX        152       1044   -85.4406
          M_RTL_EQ_4          2          2          0
         M_RTL_NEQ_4          2          2          0
         M_RTL_EQ_32          8          8          0
       M_RTL_DEC_4_2          2          2          0
             RTL_AND         28         18    55.5556
             RTL_NOT         13          3    333.333
   M_RTL_RSHIFT_4_32          3          0         NA

fifteate · September 22, 2011, 10:31am

If NA and 0 are mixed then there is no problem in interpretation. Thanks again.

birei · September 22, 2011, 10:34am

Hi,

Using 'Perl':

$ cat infile
Module :rtlc_BusSelWrap_rtl_copy_121_2_1_12 (rtlc_BusSelWrap_rtl_copy_121_2_1_12)
 M_RTL_MULT_UNS_12_2          1          0         NA
   M_RTL_RSHIFT_2_14          1          0         NA
Module :rtlc_AIMux_rtl_copy_189_4_144_18 (rtlc_AIMux_rtl_copy_189_4_144_18)
         M_RTL_EQ_32          8          8          0
             RTL_AND         28         18    55.5556
       M_RTL_DEC_4_2          2          2          0
             RTL_NOT         13          3    333.333
      M_RTL_PRIM_MUX        152       1044   -85.4406
          M_RTL_EQ_4          2          2          0
         M_RTL_NEQ_4          2          2          0
   M_RTL_RSHIFT_4_32          3          0         NA
$ cat script.pl
use warnings;
use strict;

@ARGV == 1 or die qq[Usage: perl $0 input-file\n];

my (@nas, @nums);

while ( <> ) {
        chomp;
        if ( ( my $begin = /\A(?i:module)\s*:/ ) ... ( my $end = /\A(?i:module)\s*:/ ) ) {
                if ( $begin ) {
                        printf "%s\n", $_;
                        next;
                }

                if ( ! $end ) {
                        my @f = split;
                        if ( uc $f[ $#f ] eq qq[NA] ) {
                                push @nas, $_;
                        }
                        else {
                                push @nums, $_;
                        }
                        next;
                }

                @nums = sort { (split( /\s+/, $a ))[4] <=> (split( /\s+/, $b ))[4] } @nums;
                printf qq[%s\n],
                        join qq[\n], @nums, @nas;
                @nas = ();

                redo;
        }
} continue {
        if ( eof() ) {
                @nums = sort { (split( /\s+/, $a ))[4] <=> (split( /\s+/, $b ))[4] } @nums;
                printf qq[%s\n],
                        join qq[\n], @nums, @nas;
                @nas = ();

        }
}
$ perl script.pl infile
Module :rtlc_BusSelWrap_rtl_copy_121_2_1_12 (rtlc_BusSelWrap_rtl_copy_121_2_1_12)
 M_RTL_MULT_UNS_12_2          1          0         NA
   M_RTL_RSHIFT_2_14          1          0         NA
Module :rtlc_AIMux_rtl_copy_189_4_144_18 (rtlc_AIMux_rtl_copy_189_4_144_18)
      M_RTL_PRIM_MUX        152       1044   -85.4406
         M_RTL_EQ_32          8          8          0
       M_RTL_DEC_4_2          2          2          0
          M_RTL_EQ_4          2          2          0
         M_RTL_NEQ_4          2          2          0
             RTL_AND         28         18    55.5556
             RTL_NOT         13          3    333.333
   M_RTL_RSHIFT_4_32          3          0         NA

Regards,
Birei

radoulov · September 22, 2011, 10:48am

Another Perl solution:

perl -lane'
  if (/^Module/) {
    $k and print join $/, $k, map $_->[1], sort {
      $a->[0] <=> $b->[0]
      } @a;
    $k = $_; @a = (); next
    }
  push @a, [$F[3] =~ /NA/ ? 99999 : $F[3], $_];
  END {
    print join $/, $k, map $_->[1], sort {
      $a->[0] <=> $b->[0]
      } @a
    }' infile

fifteate · September 23, 2011, 6:23am

binlib:

awk '{
  if (/^Module :/) {
   close("sort -k4,4n")
   print
  } else print |"sort -k4,4n";
}
END { close("sort -k4,4n") } # not necessary
' infile

This will put the "NA"s together with the 0s, not the last.

Thank you very much. You have given a very elegant solution.