Sort array elements from same field

Hi,

input:

line1|error_type_a@15
line1|error_type_c@10
line1|error_type_b@5
line2|error_type_f@3
line2|error_type_a@1

I would need to place all the second fields with common first field on the same line, BUT with sorted error position number:

line1|error_type_b@5; error_type_c@10; error_type_a@15
line2|error_type_a@1; error_type_f@3

I managed to put in the same line...

BEGIN{FS="|"}
{
    a[$1] = a[$1] (a[$1]?"; ":"")$2
}
    
END{for(i in a){ print i FS a}}'

...but the elements in the new second field are not sorted:

line1|error_type_a@15; error_type_c@10; error_type_b@5
line2|error_type_f@3; error_type_a@1

And when I incorporate the asort function like as follow, it returns a blank second field:

BEGIN{FS=OFS="|"}
{
    a[$1] = a[$1] (a[$1]?"; ":"")$2
}
    
END{for(i in a){
          split(a,b,"@");
          n = asort(b); 
          for (j=1; j<=n; j++){
              print i FS a[b[j]]
          }
      }
}'

Any explanation would be great !

Thanks in advance !

Warning: The awk that I use does not provide the asort() function; so all of my comments are just based on reading the man pages and visually inspecting your code; I was not able to test your code to verify my hypothesis.

I believe there are two problems with your program:

First, in your END clause, when you call split(a[i],b,"@") with the array a[] containing the elements:

a[line1] = line1|error_type_b@5; error_type_c@10; error_type_a@15
a[line2] = line2|error_type_a@1; error_type_f@3

the array b[] will either contain the elements:

line1|error_type_b
5; error_type_c
10; error_type_a
15

or the elements:

line2|error_type_a
1; error_type_f
3

I'm guessing that this is not what you wanted.

And, second, after you run asort(), the elements in b[] will be changed to a list of numbers like (1, 3, 4, 2) or (1, 2, 3), respectively. Then the for loop in the END clause is going to print:

line1 FS contents of a[b[1]]
line1 FS contents of a[b[2]]
line1 FS contents of a[b[3]]
line1 FS contents of a[b[4]]
line2 FS contents of a[b[1]]
line2 FS contents of a[b[2]]
line2 FS contents of a[b[3]]

Obviously, the FS in the above lines will be replaced by | and the standards leave it is unspecified whether the "line1" lines will come before or after the "line2" lines. But, since the only defined subscripts in the a[] array are "line1" and "line2", the a[b[j]] (AKA a[1], a[2], a[3], and sometimes a[4]) will all expand to empty strings and, therefore, the contents of a[digit] in all of the output lines will be empty.

I think this explains what went wrong, but your description didn't clearly explain what output you were trying to get, so I can't suggest a way to fix it. As a (maybe not too wild) guess, I would think you might want to replace:

    a[$1] = a[$1] (a[$1]?"; ":"")$2

with something like:

    c[a[$1]] = $2

and then sort c[a[i]] or some manipulation of it in the END clause instead of trying to split elements of a[] and sort the split elements.

Hope this helps.

I think you are doing this overly complicated. In fact this is one of the numerous applications of executing a "control break". You might want to follow the provided link.

I hope this helps.

bakunin

sort -t'|' -k1,1 -k2.14n inputfile

Numerical sort on the 2nd key field with offset 14.
Pipe this to your working awk.

Hi MadeInGermany,
Your suggestion may be exactly what the OP needs.

When I read the 1st message in this thread, I was under the impression that the "error_type_X" strings were placeholders for arbitrary (not necessarily all the same length) strings followed by "@" and a numeric string. If my assumption is correct, there will have to be some data manipulation before the input can be sorted because the UNIX sort command doesn't allow for a different field separator to be specified for each sort key because the position of the "@" in field 2 won't be a constant. However, if the 1st field is always 5 characters (which I also assumed was not true), then the command:

sort -t'@' -k2,2n -k1.1,1.5 inputfile

might work even if the "error_type_X" strings are not a constant width.

I think we at least explained why beca123456's use of awk's asort() function is producing the output it produces.

Hopefully, the discussion in this thread so far will either help the OP get a working solution or encourage the OP to give us a better description of the actual data that might appear in the input file so we can work on a real solution instead of guessing at what needs to be done.

Good idea! But order should be

sort -t'@' -k1.1,1.5 -k2,2n inputfile

Sort on the first key field until the '|' separator, and secondary numerical sort on the 2nd key field.