Find repeated word and take sum of the second field to it ,for all the repeated words in awk

100bees · July 2, 2013, 8:19am

Hi below is the input file, i need to find repeated words and sum up the values of it which is second field from the repeated work.Im trying but getting no where close to it.Kindly give me a hint on how to go about it

Input

fruits,apple,20,fruits,mango,20,veg,carrot,12,veg,raddish,30
fruits,pinapple,10,fruits,orange,5,fruits,apple,20,Grains,wheat,100

Output

fruits,40,veg,42
fruits,35,Grains,100

 
awk -F '{
  for (j=1;j<=NF;j=j+2)
   {
    for(i=3;i<=NF;i=i+3)
    {
     if ($j=$(j+2))
      {
      tot=0 
      tot=$i+$(i+2)+tot
      toprint1=$j "," tot
      }
     if ($i!=$j)
      {
      printf $j "," $i
      }  
     }
    } 
     printf toprint1 
   }' filename

bartus11 · July 2, 2013, 8:37am

I think your input should be:

fruits,apple,20,fruits,mango,20,veg,carrot,12,veg,raddish,30
fruits,pinapple,10,fruits,orange,5,fruits,apple,20,Grains,wheat,100

---------- Post updated at 08:37 AM ---------- Previous update was at 08:24 AM ----------

If you don't mind commas at the line ends:

awk -F"," '{for (i=1;i<=NF;i+=3) a[$i]+=$(i+2);for (i in a) printf i","a",";delete a;printf "\n"}' input

100bees · July 2, 2013, 8:25am

yes thanks for the correction

100bees · July 3, 2013, 12:01am

@bartus11 -Thank you, comma at the end is not a problem,but could you please explain me the code , and should it be delete a
[i]```text
awk -F"," '{for (i=1;i<=NF;i+=3) a[$i]+=$(i+2);for (i in a) printf i","a",";delete a;printf "\n"}' input


 
But I am getting an ouput like this :
 
```text
fruits,40,
veg,42,
 
fruits,75,
Grains,100,
 
fruits,75,

bartus11 · July 3, 2013, 12:41am

Post output of:

cat -ev input

100bees · July 3, 2013, 1:10am

the output of cat -ev input is

fruits,apple,20,fruits,mango,20,veg,carrot,12,veg,raddish,30$
fruits,pinapple,10,fruits,orange,5,fruits,apple,20,Grains,wheat,100$
$

Scrutinizer · July 3, 2013, 2:01am

Are you sure you aren't using

print i","a"

instead of

printf i","a"

@bartus11: When using printf it is better to specify the printf format field:

printf "%s,%s,",i,a

.
delete a is not standard awk. Instead one could use:

for (i in a) { printf "%s,%s,",i,a; delete a} printf "\n"}'

100bees · July 3, 2013, 2:38am

@Scrutinizer

Yes you are right i used print instead of printf

Now my output is like

fruits,40,veg,42,
fruits,75,Grains,100,
fruits,75,

but what i need is

fruits,40,veg,42
fruits,35,Grains,100

what the code is doing is adding the fruits on the second line too

Scrutinizer · July 3, 2013, 1:51pm

I cannot reproduce this. What is your OS and version? What happens when you remove the empty last line from the input file?

100bees · July 4, 2013, 12:49am

os -AIX
oslevel - 6.1.0.0

If i delete the empty line i get this

fruits,40,veg,42,
fruits,75,Grains,100,

I never realised an empty line could have this impact. Thank you, i am learning a lot.

Still i am not getting the correct value of "fruits" ,its getting added with the value of "fruits" on the first line

Scrutinizer · July 4, 2013, 1:09am

I am getting 35. On AIX I think, the example with delete a should not work, so what is the exact awk code that you are you using?

100bees · July 4, 2013, 1:41am

Hi Scrutinizer,
Thanks a lot for helping me out understand lot of things here.

Actually the code is working now i had placed the delete a [i]after the second for loop earlier now i placed it within the second for loop like you suggested in your previous post . I tried equating a [i]to 0 that did not work .But the following one works.

 
awk -F"," '{
for (i=1;i<=NF;i+=3)
{
a[$i]+=$(i+2)
}
for (i in a) 
{
 printf "%s,%s,",i,a
 delete a
} 
printf "\n"
}' inputfile