awk - Why does output order change?

I have this END section in an awk script:

END     {
                for (j in littlebin)
                        {
                        print (j, int(log(j)/log(2)) , littlebin[j])
                        lbsum+=littlebin[j]
                        }

                for (i in bins)
                        {
                        print (i / (1024^2) "MB" , bins)
                        }
        }

As you can see I have 2 for loops, each outputting the contents of an array, with a little extra calculation. So, it prints in this order:

(1) Contents of the littlebin array, then
(2) Contents of the bins array

Ok, good. But, I'd like the contents of each of those arrays to be sorted numerically on the first element printed. So, I add a sort pipe:

END     {
                for (j in littlebin)
                        print (j, int(log(j)/log(2)), littlebin[j]) | "sort -nk 1"
                        lbsum+=littlebin[j]

                for (i in bins)
                        print (i / (1024^2) "MB" , bins) | "sort -nk 1"
        }

This causes the output to be somewhat reversed, to become:

(1) Contents of the bins array, numerically sorted, then
(2) Contents of the littlebin array, numerically sorted

Why is that happening? And what is the best way of sorting as I'm attempting, on the index of each array? I've tried asorti, but it doesn't sort numerically. Thank you.

The order in which a for ( idx in array ) loop scans an array is not defined; it is generally based upon the internal implementation of arrays inside awk.

By the way you can use asort() / asorti() functions to sort array values / indices.

Post the code fragment that you used to sort numerically.

I attempted to use asorti, but I was unable to get it to sort numerically.

The code that I used is above. I simply appended " | sort -nk 1" to each "print" line. Here's the full script:

BEGIN   {
        binwidth = (512 * 1024)
        }

$1 >= 1387515600 && $3 == "GET" {
        mybin = (binwidth * int($2 / binwidth))
        bins[mybin]++
        }

$1 >= 1387515600 && mybin < binwidth && $3 == "GET"     {
        littlebin[2^(int(log($2)/log(2)))]++
        }

END     {
                for (j in littlebin)
                        print (j, int(log(j)/log(2)), littlebin[j]) | "sort -nk 1"

                for (i in bins)
                        print (i / (1024^2) "MB" , bins) | "sort -nk 1"
        }

These are a couple of lines from the file it's parsing:

1387499749,6324891,POST,1387497600
1387501190,1120178,GET,1387497600

Thanks for the reply.

I don't see asorti function used in the code that you posted.

All you have to do is call asorti function and copy the sorted result from source array to destination array. Finally scan the destination array to print sorted result.

By the way the sort -nk 1 command usage in your code is logically wrong and is not going to help.

That's correct; asorti is not in my code presently. When I attempted to use it, it sorted my indices alphabetically, not numerically. It was done like this:

n = asorti(littlebin, lbidx, "ascending")
      for (j = 1 ; j <= n ; j++)
      print lbidx[j], littlebin[lbidx[j]]

The output was:

1024 142
131072 810
16384 249
2048 82
262144 720
32768 677
4096 68
512 181
65536 1128
8192 181

As you can see, it is not sorted numerically.

Here is an example sorting by value:

awk  '
        BEGIN {
                n = split ("1024 131072 16384 2048 262144 32768 4096 512 65536 8192", A)
        }
        END {
                print "Before sorting array"
                for ( i = 1; i <= n; i++ )
                        print A

                n = asort ( A, B )

                print "After sorting array"
                for ( i = 1; i <= n; i++ )
                        print B

        }
' /dev/null

Producing output:

Before sorting array
1024
131072
16384
2048
262144
32768
4096
512
65536
8192
After sorting array
512
1024
2048
4096
8192
16384
32768
65536
131072
262144

An alternative approach that should work (although I haven't tested it) would be to change the following lines in your original code:

END     {
                for (j in littlebin)
                        print (j, int(log(j)/log(2)), littlebin[j]) | "sort -nk 1"

                for (i in bins)
                        print (i / (1024^2) "MB" , bins) | "sort -nk 1"
        }

to:

END     {
                for (j in littlebin)
                        print (j, int(log(j)/log(2)), littlebin[j]) | "sort -n"
                close("sort -n")

                for (i in bins)
                        print (i / (1024^2) "MB" , bins) | "sort -n"
        }