Add 8 columns at the end of .csv file using awk

Zam_1234 · March 23, 2015, 2:15am

Hello all,
I have a .csv file of 16 columns consists of bunch of numbers.

       6.45E+01   1.17E+01   8.10E+04   8.21E+01   8.50E+00   1.20E+01   1.02E+01   1.88E+01   1.86E+04   3.53E+03   1.09E+07   3.82E+04   2.09E+03   3.57E+03   2.98E+03   3.93E+03
       6.34E+01   3.23E+01   9.24E+04   3.25E+01   3.18E+01   3.35E+01   3.26E+01   4.19E+01   2.96E+04   1.59E+04   2.86E+07   1.61E+04   1.58E+04   1.59E+04   1.59E+04   1.76E+04
       3.82E+01   3.29E+01   1.04E+05   3.25E+01   3.22E+01   3.23E+01   3.23E+01   3.73E+01   1.75E+04   1.63E+04   5.78E+07   1.62E+04   1.62E+04   1.62E+04   1.62E+04   1.71E+04
       4.16E+01   3.21E+01   4.16E+04   3.26E+01   3.25E+01   3.25E+01   3.27E+01   4.24E+01   1.78E+04   1.62E+04   3.85E+07   1.62E+04   1.62E+04   1.62E+04   1.62E+04   1.79E+04
       4.29E+01   3.28E+01   1.76E+04   3.30E+01   3.31E+01   3.29E+01   3.31E+01   4.08E+01   1.84E+04   1.62E+04   2.48E+07   1.62E+04   1.62E+04   1.62E+04   1.62E+04   1.79E+04
       6.02E+03   2.88E+02   1.78E+04   4.71E+02   3.23E+01   9.60E+03   3.22E+01   3.74E+01   4.26E+06   1.34E+06   2.21E+07   5.42E+05   1.62E+04   1.65E+07   1.62E+04   1.71E+04
       4.09E+01   3.21E+01   4.85E+04   7.88E+04   3.25E+01   1.45E+03   3.30E+01   4.20E+01   1.80E+04   1.64E+04   2.93E+07   3.56E+07   1.62E+04   2.66E+06   1.62E+04   1.79E+04

I want to take the last 8 columns in awk and multiply them with 100 and append the results in new column (column 17 to 24).

This is the code I have tried so far:

awk 'BEGIN{FS=",";RS="\n";OFS=",";ORS="\n";} {for (i=1;i<=8;++i){$(NF+i)=$(i+8)*100;}}1' file.csv > newfile.csv

I am getting the results, but problem is, it is appending extra "," in the csv before every new columns. So after 1 extra "," after the 16th column with new data from column 9, then 1 extra "," columns after 17th with the new data from column 10 and so on..... Basically it is adding extra "," after every loop for the value of i.
Output Observing:

>awk 'NR==1 {print $0}' newfile.csv 
2.668750000000000000e+01,1.687500000000000000e+00,1.500000000000000000e+00,6.712500000000000000e+01,1.625000000000000000e+00,4.187500000000000000e+00,7.875000000000000000e+00,1.375000000000000000e+00,4.298700000000000000e+04,9.099000000000000000e+03,3.378000000000000000e+03,1.111399900000000000e+07,6.906000000000000000e+03,1.057600000000000000e+04,5.911000000000000000e+03,2.280000000000000000e+03,0.030705,,0.00649929,,,0.00241286,,,,7.93857,,,,,0.00493286,,,,,,0.00755429,,,,,,,0.00422214,,,,,,,,0.00162857

Confused on how to fix this...

Also, how to add heading to the new columns that I just created?

Thanks in advance

MasWag · March 23, 2015, 2:49am

Hello Zam_1234,
How about this?

awk 'BEGIN{FS=",";RS="\n";OFS=",";ORS="\n";} {for (i=1;i<=8;++i){$(16+i)=$(i+8)*100;}}1' file.csv > newfile.csv

Zam_1234 · March 23, 2015, 2:59am

This worked. But why didn't the one I had worked? Seems like only change is I was using "NF" and you use the hardcoded number "16".

Also, if the number of columns change in the future, what is way to do this robustly.

Don_Cragun · March 23, 2015, 3:02am

I am confused.

The sample input data you provided was a single line with 112 fields; not the 16 field lines you were talking about. (I tried guessing at the intended format, but I have no idea whether I guessed correctly.)

Your sample awk code specifies that the input and output field separators are commas, but your sample input data contains no commas.

You say you want to add headings to the new fields you create, but you don't specify what the headings should be and there aren't any headings on the other fields (to use as examples).

And, the sample output you showed us can't possibly have been produced from the sample input you showed us.

Please clearly specify what format your input takes, and show us a sample of the output you're trying to produce. (And PLEASE use CODE tags (not ICODE tags) for sample input and output as well as for code samples.)

Zam_1234 · March 23, 2015, 3:20am

don cragun:

I am confused.

The sample input data you provided was a single line with 112 fields; not the 16 field lines you were talking about. (I tried guessing at the intended format, but I have no idea whether I guessed correctly.)

Your sample awk code specifies that the input and output field separators are commas, but your sample input data contains no commas.

You say you want to add headings to the new fields you create, but you don't specify what the headings should be and there aren't any headings on the other fields (to use as examples).

And, the sample output you showed us can't possibly have been produced from the sample input you showed us.

Please clearly specify what format your input takes, and show us a sample of the output you're trying to produce. (And PLEASE use CODE tags (not ICODE tags) for sample input and output as well as for code samples.)

I understand your confusion. Here is hopefully more clarified input file format that I have:

awk 'NR==1{print $0}' input.csv 
6.450000000000000000e+01,1.168750000000000000e+01,8.102425000000000000e+04,8.206250000000000000e+01,8.500000000000000000e+00,1.200000000000000000e+01,1.018750000000000000e+01,1.875000000000000000e+01,1.864400000000000000e+04,3.527000000000000000e+03,1.093483900000000000e+07,3.818700000000000000e+04,2.089000000000000000e+03,3.574000000000000000e+03,2.979000000000000000e+03,3.933000000000000000e+03

(I copy/pasted from MS Excel in my original post, thus the confusion of not seeing any commas in the .csv)

For the heading, I want to name each column as following:
x1,x2 ...x8, test_x1, testx2, ...test_x8, mul_x1,mulx2,...mulx8
(total 24 names)

Thanks in advance.

Don_Cragun · March 23, 2015, 4:38am

With your new input format, and assuming you want the added fields in the same numeric format, and adding the headers you requested; try:

awk '
BEGIN {	for(i = 1; i <= 8; i++) {
		$i = sprintf("x%d", i)
		$(8 + i) = sprintf("test_x%d", i)
		$(16 + i) = sprintf("mul_x%d", i)
	}
	FS = OFS = ","
	print
}
{	for(i = 9; i <= 16; i++)
		$(i + 8) = sprintf("%.18e", $i * 100)
}
1' input.csv > newfile.csv

which, with your latest single line sample input produces the output:

x1,x2,x3,x4,x5,x6,x7,x8,test_x1,test_x2,test_x3,test_x4,test_x5,test_x6,test_x7,test_x8,mul_x1,mul_x2,mul_x3,mul_x4,mul_x5,mul_x6,mul_x7,mul_x8
6.450000000000000000e+01,1.168750000000000000e+01,8.102425000000000000e+04,8.206250000000000000e+01,8.500000000000000000e+00,1.200000000000000000e+01,1.018750000000000000e+01,1.875000000000000000e+01,1.864400000000000000e+04,3.527000000000000000e+03,1.093483900000000000e+07,3.818700000000000000e+04,2.089000000000000000e+03,3.574000000000000000e+03,2.979000000000000000e+03,3.933000000000000000e+03,1.864400000000000000e+06,3.527000000000000000e+05,1.093483900000000000e+09,3.818700000000000000e+06,2.089000000000000000e+05,3.574000000000000000e+05,2.979000000000000000e+05,3.933000000000000000e+05

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk .

As for why your code was adding empty fields:

awk '...{for (i=1;i<=8;++i){$(NF+i)=$(i+8)*100;}}...'

Assuming that you start with an input record containing 16 fields, NF will initially be 16.
After the 1st time through the loop, NF will be 17 (16 + 1) with $17 set to $9 * 100.
After the 2nd time through the loop, NF will be 19 (17 + 2) with $19 set to $10 * 100.
After the 3rd time through the loop, NF will be 22 (19 + 3) with $22 set to $11 * 100.
...
After the 8th time through the loop, NF will be 52 (44 + 8) with $52 set to $16 * 100.
And all of the unset fields (18, 20, 21, 23, 24, 25, 27, ... 51) will default to the empty string.