Sum of Coulmns

RGD · November 27, 2010, 8:52am

Hi,

I have recently started learning shell scripting (BASH)in my school ,now we must write a shell script that does the following :

The problem statement:
it takes a directory name as an argument and handles the files in that directory according to the following rule:

� The files that end with extension .csv (comma-separated values) are moved to subfolder CSV. However, before doing the move, the columns of each file have to be summed and a new row containing the totals should be appended to each .csv file. Assume that the .csv files have 4 columns each with the following column

formatting:
string, value, value, value.

The attempts:

I already did the part that takes the file and moves it to the new subfolder, I also managed to sum all the columns using awk , it may not be the best way ,but I'm still learning , the problem is in printing the result in the file , that is to add the new row that contains the sums. I thought that I could pass variables to awk to store the total of each column but I couldnt understand how to pass variables to awk ! , or may be there is a better way to do that.

Off course I'm working on each task alone ,then I'll combine them in one script , so this is a part of the script I wrote so far:

#!/bin/sh

 
#string, value, value, value.
 
 
#gets the sums of each column 
 
 
clear 
myfile=$1
 
awk'{total1 += $3 } END {print total1}' $myfile 
 
awk '{total2 += $5 } END {print total2}' $myfile 
 
awk '{total3 += $7 } END {print total3}' $myfile

could you please help ?!

Birziet University ,West Bank ,Palestinian Territory
Linux Laboratory
ENCS313
Dr. Hanna Bullata

Thanks in advance

bakunin · November 29, 2010, 5:08am

Ok, lets start with some plan: your program layout will have to look something like

main ()
{
while (every file in the dir named *csv)
     process_file( file )
done
}


function process_file( filename )
{
insert_column_with_sums( filename )
move_to_subfolder( filename )
}

Good. lets go over your solution:

First off, some explanation how awk works: basically it is a rule-based language. A file is read, line by line, and one rule after the other is applied to each line (which may or may not alter the lines contents). After application of all the rules the next line is read in and the process starts anew. A "rule" usually consists of following: a regexp and some commands. If the line matches the regexp, the commands accompanying it are executed, otherwise they aren't. No regexp means the commands are executed for all lines.

There are three special rules, named "BEGIN", "END" and one with no name at all. "BEGIN" is executed before any lines are read from input, "END" is executed after all the lines are being read. The rule with no name is executed for every line of the input file.

What does that mean for your script?

#!/bin/sh

#string, value, value, value.
#gets the sums of each column

clear
myfile=$1

awk'{total1 += $3 } END {print total1}' $myfile
awk '{total2 += $5 } END {print total2}' $myfile
awk '{total3 += $7 } END {print total3}' $myfile

First off, you could do it all in one pass. Btw., it is good style to initialize variables you are going to use, instead of taking them for granted:

awk '
BEGIN {
     total1=0;
     total2=0;
     total3=0;
}

{
     total1 += $3;
     total2 += $5;
     total3 += $7;
}

END {
     print total1;
     print total2;
     print total3;
}' $myfile

It should be easy for you now to modify the script according to your requirement.

Another point is: Might there be lines which don't need processing? Lets consider the following:

# header line with no meaning
string, value, value, value
string, value, value, value
# another line with no meaning
string, value, value, value

You don't want to process the lines 1 and 4 in this case. Do you have an idea how to achieve this, from what i told you?

Still, there is a more subtle point I'd like to raise - one, which isn't explicitly covered by your requirement, but is best learned from the very beginning of ones programming career: If you get user input you should validate it! You write:

myfile=$1

From where do you know that "$1" is a legitimate directory (or a legitimate file, for that purposes)? Lets say i enter "yourscript /foo/bar/gnarble/furble/isnodirorfileatall". What would your script do?

You might want to read the man page for "test" (which is handy Unix utility and comes under two names: "test" and "[") to understand the following:

if [ -d "$1" -a -r "$1" ] ; then
     mydir="$1"
else
     echo "ERROR: no reasonable directory name given" >&2
fi

I hope this helps.

bakunin

RGD · November 30, 2010, 3:53am

Bakunin,

first off , I would like to thank you ,,your post was more than helpful .

I think I missed this point , I'll do something about it.

Yes , I know its an important point , and I'am going to make sure to add this to my script, it just about that my problem was in awk , now I understand what to do .

Thanks again.

RGD