awk to sum a column based on duplicate strings in another column and show split totals

Hi,
I have a similar input format-

A_1 2
B_0 4
A_1 1
B_2 5
A_4 1

and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks!
letter number_of_letters Total Split

A        3        4     2+1+1
B        2        10    4+5

That would require something like this:

awk -F'[_ ]*' '
  {
    A[$1]++
    n=$2*$3
    if(n>B[$1]) B[$1]=n
    C[$1]=C[$1] (C[$1]==""?x:"+") $3
  } 
  END{
    for(i in A) print i, A, B, C
  }
' OFS='\t' file

If not please specify more elaborately what it is that you need. Also, next time please show your attempts at a solution...

Thank you.Can you pls tell me how to get rid of the _ field separator & count the duplicates in $1?
Say, for the input

A_1 2
B_0 4
A_1 1
B_0 5
A_1 1

and output should be

A_1       3        4     2+1+1 B_0       2        10    4+5

How do you arrive at 10 for B_0 ?

Sorry, typo.
It should read:

A        3        4     2+1+1
B        2        9    4+5

---------- Post updated at 09:28 AM ---------- Previous update was at 09:26 AM ----------

Oops.. this is the correct format required:

A_1        3        4     2+1+1
B_0        2        9    4+5

Try:

awk '
  {
    A[$1]++
    B[$1]+=$2
    C[$1]=C[$1] (C[$1]==""?x:"+") $2
  } 
  END{
    for(i in A) print i, A, B, C
  }
' OFS='\t' file
1 Like