Generate sorted awk array for further processing

Ophiuchus · October 27, 2018, 9:14pm

Hello to all,

I have and input file like this:

Objects (id: bounding-box centroid area mean-color):
  0: 800x800+0+0 406.6,390.9 378792 srgb(0,0,0)
  11: 240x151+140+624 259.5,699.0 36240 srgb(255,255,255)
  6: 240x151+462+176 581.5,251.0 36240 srgb(255,255,255)
  7: 240x151+87+257 206.5,332.0 36240 srgb(255,255,255)
  8: 240x151+366+355 485.5,430.0 36240 srgb(255,255,255)
  9: 240x151+77+448 196.5,523.0 36240 srgb(255,255,255)
  10: 240x151+468+542 587.5,617.0 36240 srgb(255,255,255)
  2: 178x59+223+65 311.5,94.0 10502 srgb(255,255,255)
  3: 178x59+417+65 505.5,94.0 10502 srgb(255,255,255)
  4: 178x59+611+65 699.5,94.0 10502 srgb(255,255,255)
  1: 178x59+29+65 117.5,94.0 10502 srgb(255,255,255)
  5: 110x16+255+63 309.5,182.5 1760 srgb(255,255,255)

I'm interested in second field, for example the second element of second field is "240x151+140+624". If we use as field separator "+" for this second field, then would be 3 subfields within original 2nd field.

I want to have and awk array (in this case array "a") with this 2nd field sorted first by 3rd subfield and then by second subfield (where new FS="+").

I'm doing this and it works with the code below but I need first an awk program, then pipe to sort command then pipe again for the 2nd awk program.

  
awk 'NR>2{print $2}' file | sort -t "+" -k3n -k2n |  
awk '{a[NR]=$0} END{for (i=1;i<=length(a);i++) print a }'
110x16+255+63
178x59+29+65
178x59+223+65
178x59+417+65
178x59+611+65
240x151+462+176
240x151+87+257
240x151+366+355
240x151+77+448
240x151+468+542
240x151+140+624

How to get the sorted array "a" in a single awk program (without pipe twice) to be able to make further processing in the END{} block?

Thanks in advance

RudiC · October 28, 2018, 9:13am

With close to 300 posts, you should know that supplying OS, shell, and tools' names and versions helps peopls help you. You usage of length(a) for array a seems to indicate gawk which offers greatly different possibilities, e.g. the sort function, than e.g. my mawk , for which a pipe through sort would be indispensable.

Your second pipe into the second awk is pointless, as you collect the stdin into an array, and then, in the END section, just print the array in sequence. Just drop it.
You may even drop the sort command that you pipe into, if you make use of gawk 's sorting capabilities. If you can't, you could run the sort command from within awk , like

awk 'NR > 1 {print $2 | "sort -t+ -k3n -k2n "} ' file

, but the effect will be marginal, compared to the external piping.

For what reason do you skip line 2 of the input file?

Ophiuchus · October 28, 2018, 10:08am

Hi RudiC,

Thanks for answer.

I'm using bash in Cygwin under Windows. I have access to awk and gawk.

The input file shows coordinates and sizes of some regions within an image. I skip the first coordinate because it represents the entire image itself and I don't need that.

I cannot drop the second awk program because I want to have the sorted coordinates in an array because finally I need to print them in especific order. For example, once having the coordinates sorted within an array I need to print from element 2 to 5, then print element 1, then print from 6th to N-1 and then print last element of array.

I hope make sense why I need to have the sorted info in an array.

Thabks