Column sum

ctrld · July 25, 2016, 4:19pm

I have a text file in below format

First Column Header |Second Column Header| 	Third Column Header|	Fourth Column Header| Fifth Column Header
DATA1|	1.4|22.80|	6.6|6.55
DATA2|	1.4|NA|	6.8|6.83
DATA3|	2.4|22.80|	4.5|	4.45
DATA4|	4.4|NA|	8.8|	8.77
DATA5|	5.4|22.80|	NA|	10.93

I need to sum 2nd,3rd,4th and 5th columns, but numbers only. Should ignore all strings like "NA" and space. Output like:

First Column Header |Second Column Header| 	Third Column Header|	Fourth Column Header| Fifth Column Header
DATA1|	1.4|22.80|	6.6|6.55
DATA2|	1.4|NA|	6.8|6.83
DATA3|	2.4|22.80|	4.5|	4.45
DATA4|	4.4|NA|	8.8|	8.77
DATA5|	5.4|22.80|	NA|	10.93
|15.00|68.40|26.7|37.53

RavinderSingh13 · July 26, 2016, 12:21am

Hello ctrld,

Could you please try following and let me know if this helps.

awk -F"|" 'NR==1{print;next} {print;Q=Q?Q FS ($2+$3+$4+$5):($2+$3+$4+$5)} END{print Q}'  Input_file

Output will be as follows.

First Column Header |Second Column Header| 	Third Column Header|	Fourth Column Header| Fifth Column Header
DATA1|	1.4|22.80|	6.6|6.55
DATA2|	1.4|NA|	6.8|6.83
DATA3|	2.4|22.80|	4.5|	4.45
DATA4|	4.4|NA|	8.8|	8.77
DATA5|	5.4|22.80|	NA|	10.93
37.35|15.03|34.15|21.97|39.13

Although I am not at all sure how and in which sequence you have shown the sum of fields for each line in Input_file at very bottom of your shown expected output. My above code will give SUM of fields in sequence of lines from 2nd line to till last line, if above doesn't meet your requirements then please get back to us with complete requirement details and expected sample output(with details).

Thanks,
R. Singh

Akshay_Hegde · July 26, 2016, 12:56am

This may help you

Input

[akshay@localhost tmp]$ cat file
First Column Header |Second Column Header| 	Third Column Header|	Fourth Column Header| Fifth Column Header
DATA1|	1.4|22.80|	6.6|6.55
DATA2|	1.4|NA|	6.8|6.83
DATA3|	2.4|22.80|	4.5|	4.45
DATA4|	4.4|NA|	8.8|	8.77
DATA5|	5.4|22.80|	NA|	10.93

Script

[akshay@localhost tmp]$ cat test.awk
BEGIN{
        # if variable cols not defined exit with below message to user
	if(cols==""){ print "Please mention column number separated by comma" ; exit }

        # split string cols separated by comma and save columns to be added in array c
	split(cols,c,/,/)
}
FNR>1{
        # Skip header and loop through array c and sum up fields, save result in array s
	for(i in c)
		s[c]+= $(c)
}1
END{
        # Loop through fields and print result
	for(i=1; i<=NF; i++){		
		printf("%s%s",(i==1?"":OFS),(i in s ? s : "NA"))
	}
	print ""
}

How to Execute

[akshay@localhost tmp]$ awk -vFS='|' -vOFS='|' -vcols="2,3,4,5" -f test.awk file

Output

First Column Header |Second Column Header| 	Third Column Header|	Fourth Column Header| Fifth Column Header
DATA1|	1.4|22.80|	6.6|6.55
DATA2|	1.4|NA|	6.8|6.83
DATA3|	2.4|22.80|	4.5|	4.45
DATA4|	4.4|NA|	8.8|	8.77
DATA5|	5.4|22.80|	NA|	10.93
NA|15|68.4|26.7|37.53

Don_Cragun · July 26, 2016, 1:15am

The following produces the output you said you want. (I don't understand why you want two digits after the decimal in the sum of field 2 since all of the input lines only have one digit after the decimal point in field 2; but the output matches what you requested.)

awk '
BEGIN {	FS = OFS = "|"
}
NR > 1 {for(i = 2; i <= 5; i++)
		s += $i
}
1
END {	printf("%s%.2f%s%.2f%s%.1f%s%.2f\n", OFS, s[2], OFS, s[3], OFS, s[4],
	   OFS, s[5])
}' file

producing the output:

First Column Header |Second Column Header| 	Third Column Header|	Fourth Column Header| Fifth Column Header
DATA1|	1.4|22.80|	6.6|6.55
DATA2|	1.4|NA|	6.8|6.83
DATA3|	2.4|22.80|	4.5|	4.45
DATA4|	4.4|NA|	8.8|	8.77
DATA5|	5.4|22.80|	NA|	10.93
|15.00|68.40|26.7|37.53

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

ctrld · July 26, 2016, 1:23am

Thanks all. The first solution probably was for row sum. Second and Third perfectly worked. Third solution is more crisp and clear, I liked it the most. Thanks a ton guys.