Sum of a column as new column based on header in a script

Hello,

I am trying to store sum of a column as a new column inside a file but have to find the column names dynamically

I/p

c1,c2,c3,c4,c5
10,20,30,40,50
20,30,40,50,60

If i want to find sum only column c1, c3 and output it as c6,c7
O/p

c1,c2,c3,c4,c5,c6,c7
10,20,30,40,50,30,70
20,30,40,50,60,30,70

note that i want to do this dynamically what i mean is i do not know the position of the column and i want to do this sum for multiple columns

I can get the total sum of each column like this

#!/bin/sh
awk -F, '{for(i=1;i<=NF;i++)a+=$i}
        END{for(i=1;i<=NF;i++)printf "%d%s", a, (i==NF?"\n":",")}'file

but how can i get the sum of each column as a seperate column in the file especially when i do not know the position of that column.

Thanks.

How about

awk -F, '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0
                }
FNR == 1        {next
                }

NR == FNR       {for (i=1; i<=CNT; i++) SUM += $(COL)
                 next
                }
                {for (i=1; i<=CNT; i++) $(NF+1) = SUM
                 print
                }
' OFS="," MCOL="c1,c2" file file

Yes, you read correctly, supply the input file twice - once to determine the values, once to print them.

Nice thanks ,this is a very interesting way to write code but here is the output I am getting when I execute this code

c1,c2,c3,c4,c5
c1,c2,c3,c4,c5,30,50
10,20,30,40,50,30,50
20,30,40,50,60,30,50

i tried to say

awk -F, '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0,MCOL
                }
FNR == 1        {next
                }

NR == FNR       {for (i=1; i<=CNT; i++) SUM += $(COL)
                 next
                }
                {for (i=1; i<=CNT; i++) $(NF+1) = SUM
                 print
                }
' OFS="," MCOL="c1,c2" file file

some thing like this but it is printing the whole line ideally this is how i want my out put

c1,c2,c3,c4,c5,sum_c1,sum_c3
10,20,30,40,50,30,50
20,30,40,50,60,30,50

is this possible. thanks for the input

------ Post updated at 02:13 AM ------

I am also trying something like this but I am pretty sure i am writing the code wrongly as i am getting syntax error. I am trying out the code on ideone so it doesnt print the error out unfortunately.

Please can you point out what is wrong with my approach or syntax here

#!/usr/bin/sh
col="c1,c3"   
idx=   
sum=0         
{
IFS=, read -ra values <<< "$col"
for v in "${values[@]}"
do
  for i in ${v//,/ } do
     while IFS=, read -r -a line; do
     sum=$(( sum + ${line[$i]} 
	 echo $sum
	 done
  done
done
}<testfile

Thanks.

Obviously, if you're getting syntax errors you're doing something wrong. Since you haven't shown us what errors you're getting, haven't given us comments to explain what your code is trying to do, haven't told us what shell you're using, and haven't told us what operating system you're using; it is hard to suggest how to fix your code to do what you expect it to do.

It is clear that you have a do that is being interpreted as a command argument when you probably intended for it to be interpreted as a keyword. It seems likely that you're trying to read your input file twice in your inner loop but you have only given the nested loops one copy of the file to read.

And, I don't see anything in your code that would produce the output header you say you want.

If we look at the code RudiC suggested (which would not produce second line of the output you showed us at the top of the post I quoted above), it seems obvious to me that he was giving you sample code that would be easy to modify to produce the code you requested in post #1 in this thread (which is not what you asked for in your last post).

Did you try to read and understand his code? If you had trouble understanding it, why didn't you ask questions about the part(s) you didn't understand? Didn't you see that the MCOL variable specifies the headings of the fields that are to be summed and the order in which the sums of those fields are to appear in the output lines?

So his example code gave you the sums of fields with the heading c1 and c2 instead of c1 and c3 . A one character change in his sample code would have fixed that for you.

And his example just copied the input file heading line to the output instead of adding the two extra field headings. If we go back to his suggested code and make some minor modifications for your new output request:

awk -F, '
NR == 1         {printf("%s", $0)
		 for (i=1; i<=NF; i++)
			if ("," MCOL "," ~ "," $i ",") {
				COL[++CNT] = i
				printf(",sum_%s", $i)
			}
                 print ""
                }
FNR == 1        {next
                }

NR == FNR       {for (i=1; i<=CNT; i++) SUM += $(COL)
                 next
                }
                {for (i=1; i<=CNT; i++) $(NF+1) = SUM
                 print
                }
' OFS="," MCOL="c1,c3" file file

If I run this code with your sample input, I get the output:

c1,c2,c3,c4,c5,sum_c1,sum_c3
10,20,30,40,50,30,70
20,30,40,50,60,30,70

which seems to be what you said you wanted, but does not match what you have above where the last field in the last two lines of the output is 50 instead of 70 . (Maybe RudiC's suggestion of calculating and printing the sums of the 1st and 2nd fields instead of the 1st and 3rd fields was correct???)

2 Likes

Yes I was getting syntax issues which as you mentioned the error I should have posted here but I was trying my code on ideone.com which was not printing where the error is it was not me being lazy.I am using bash shell which is by default on the mentioned website.

#!/usr/bin/sh
col="c1,c3"   
idx=   
sum=0         
{
IFS=, read -ra values <<< "$col" #Here I am trying to read column names from col variavble
for v in "${values[@]}"# looping all the column names i passed in the col variable
do
  for i in ${v//,/ } do# Here since the @ in the above for loop has values like c1,c3 comma seperated i was trying to read one by one
     while IFS=, read -r -a line; do# now read all the values under the column (first c1 then c3)
     sum=$(( sum + ${line[$i]} $sum the values under columns passed
	 echo $sum
	 done
  done
done
}<testfile

Now yes I know my syntax is wrong just was not getting error message but also I am pretty new to programming(Bad defense)

Yes I haven't figured that part yet sorry my bad i should have been more precise.

I wasn't trying to say RudiC code is wrong. I was getting the above mentioned output on ideone.com now since i am at work i will try it on my terminal and yes MCOL i did try to print it by saying

print $0,MCOL

or

print $0,MCOL_sum

instead of

print $0,MCOL

. The reason I did not ask clarification on his code yet is i am still trying to read through his code and once i do all my research i can come back with answers. Also i was trying to write the same code in my own way which obviously failed miserably.

Finally Thanks for everything I will keep this feedback in mind in my future posts

stderr from your original code on ideone.com :

./prog.sh: line 10: syntax error near unexpected token `while'
./prog.sh: line 10: `     while IFS=, read -r -a line; do'

... looks like a semicolon is missing in the preceding line.

You should try to make very clear, what variables and arrays you use, and what values they are assigned (if at all), and where this might deviate from what you expect.

Thanks RudiC and Don Cragun. your solutions work elegantly. I am still working through how i can write this code differently for my learning purpose and will update this thread once i am ablr to figure it out.