Awk: group rows by id and simple conversion

eagle_fly · March 13, 2013, 4:04am

Hi all,

I am a newbie to awk and trying to learn by doing examples.
I got stuck at this relatively simple conversion.
The start file looks like:

1 2 "t1"
1 3 "h1"
2 1 "h1"
2 2  "h2"

and I want to convert it into

1 t1:2, h1:3;
2 h1:1, h2:2;

Thanks.

pamu · March 13, 2013, 4:09am

$ awk '{gsub("\"","");A[$1]=A[$1]?A[$1]", "$3":"$2:$1" "$3":"$2}END{for(i in A){print A";"}}' file

1 t1:2, h1:3;
2 h1:1, h2:2;

eagle_fly · March 13, 2013, 4:29am

pamu.

Thanks for your help.
Somehow I get the supposedly first line to be always at the end.
Any idea why that is?

E.g. for the above example

2 h1:1, h2:2;
1 t1:2, h1:3;

pamu · March 13, 2013, 5:12am

Try with this..

awk '{gsub("\"","");A[$1]=A[$1]?A[$1]", "$3":"$2:$1" "$3":"$2;if(!B[$1]++){C[++a]=$1}}END{for(i=1;i<=a;i++){print A[C]";"}}' file

RudiC · March 13, 2013, 5:20am

The reversal of lines is due to the fact that (i in A) supplies all array elements, but in undefined order. To keep the right sequence, try:

awk     '       {gsub (/"/,"")
                 if (X != $1) printf "%s%s", X!=""?";\n":"", $1
                 printf "%s %s:%s", X==$1?",":"", $3, $2
                 X = $1
                }
         END    {printf ";\n"
                }
        ' file
1 t1:2, h1:3;
2 h1:1, h2:2;

eagle_fly · March 13, 2013, 6:58am

Thanks for great solutions!

I want to print the largest id, i.e. C [i]for the max i.
I can see how to run another for loop, but is there a better way?

pamu · March 13, 2013, 7:04am

See i m not clear what you are looking for.

but for max i you can use a .
and print C[a]

Hope this helps

pamu

Jotne · March 13, 2013, 10:00am

awk -F'[ "]+' '{printf "%s %s:%s, ",$1,$3,$2;getline;printf "%s:%s;\n",$3,$2}' file
1 t1:2, h1:3;
2 h1:1, h2:2;

eagle_fly · March 13, 2013, 11:16am

pamu, this makes perfect sense.
However, it seems to not work.

awk '{gsub("\"","");A[$1]=A[$1]?A[$1]", "$3":"$2:$1" "$3":"$2;if(!B[$1]++){C[++a]=$1}}END{print "max ", C[a]; for(i=1;i<=a;i++){print A[C]";"}}' file

RudiC · March 13, 2013, 11:25am

If the input file is sorted ascendingly, the max would be in the last line, so by piping the output through tail -1 you'd have the desired result.