Extract numbers from text file work out average

rich1 · April 22, 2010, 3:35pm

Just wondering if someone could assist me with shell script I'm trying to write. I need to read the final column of a text file (shown below) and workout what the average number is. The text file will have a variable number of lines, I just want the script to pull out the values in the final field (which I've done using cut), add them all together then divide by the total number of rows. Really struggling with this! Help much appreciated

abcd e544f222e0jk55 1503
abcd e544f222e0jk56 1504
abcd e544f222e0jk57 1505
abcd e544f222e0jk58 1506
abcd e544f222e0jk59 1507
abcd e544f222e0jk60 1508
abcd e544f222e0jk61 1509
abcd e544f222e0jk62 1510
abcd e544f222e0jk63 1511
abcd e544f222e0jk64 1512
abcd e544f222e0jk65 1513
abcd e544f222e0jk66 1514
abcd e544f222e0jk67 1515
abcd e544f222e0jk68 1516
abcd e544f222e0jk69 1517
abcd e544f222e0jk70 1518
abcd e544f222e0jk71 1519
abcd e544f222e0jk72 1520
abcd e544f222e0jk73 1521
abcd e544f222e0jk74 1522
abcd e544f222e0jk75 1523
abcd e544f222e0jk76 1524
abcd e544f222e0jk77 1525
abcd e544f222e0jk78 1526
abcd e544f222e0jk79 1527
abcd e544f222e0jk80 1528
abcd e544f222e0jk81 1529
abcd e544f222e0jk82 1530
abcd e544f222e0jk83 1531
abcd e544f222e0jk84 1532

Scott · April 22, 2010, 3:42pm

Hi.

One with awk...

$ cat file1
abcd e544f222e0jk55 1503
abcd e544f222e0jk56 1504
abcd e544f222e0jk57 1505
abcd e544f222e0jk58 1506
abcd e544f222e0jk59 1507
abcd e544f222e0jk60 1508
abcd e544f222e0jk61 1509
abcd e544f222e0jk62 1510
abcd e544f222e0jk63 1511
abcd e544f222e0jk64 1512
abcd e544f222e0jk65 1513
abcd e544f222e0jk66 1514
abcd e544f222e0jk67 1515
abcd e544f222e0jk68 1516
abcd e544f222e0jk69 1517
abcd e544f222e0jk70 1518
abcd e544f222e0jk71 1519
abcd e544f222e0jk72 1520
abcd e544f222e0jk73 1521
abcd e544f222e0jk74 1522
abcd e544f222e0jk75 1523
abcd e544f222e0jk76 1524
abcd e544f222e0jk77 1525
abcd e544f222e0jk78 1526
abcd e544f222e0jk79 1527
abcd e544f222e0jk80 1528
abcd e544f222e0jk81 1529
abcd e544f222e0jk82 1530
abcd e544f222e0jk83 1531
abcd e544f222e0jk84 1532

$ awk '{T+= $NF} END { print T/NR }' file1
1517.5

rich1 · April 22, 2010, 3:51pm

many thanks for the reply scottn, I get the following:

awk: line 1: syntax error at or near END

script is:

#!/bin/sh
ls -l logs/db
#echo "please enter the name of the file you wish to analyse:"
#read filetoint
cat stats.txt
awk '!(T+= $NF) END { print T/NR }' stats.txt

Scott · April 22, 2010, 4:01pm

Hi.

I changed my awk slightly. Try that one.

It doesn't look like the error from Solaris, but if it is, use /usr/xpg4/bin/awk or nawk.

pseudocoder · April 22, 2010, 4:06pm

change

awk '!(T+= $NF) END { print T/NR }' stats.txt

to

awk '!{T+= $NF} END { print T/NR }' stats.txt

---------- Post updated at 22:06 ---------- Previous update was at 22:02 ----------

@scottn

I've found a 6 yr. old posting of Ygor here.
Now I'm almost able to "read" your code.
So "T" means Total and "$NF" means 3rd column, right?
I really can't understand how "$NF" stands for 3rd column....

rich1 · April 22, 2010, 4:09pm

many thanks for this, I'm actually scripting it on ubuntu 9.10? now returns the following error:

awk: line 1: syntax error at or near {

Scott · April 22, 2010, 4:13pm

pseudocoder:

change
awk '!(T+= $NF) END { print T/NR }' stats.txt
to
awk '!{T+= $NF} END { print T/NR }' stats.txt
---------- Post updated at 22:06 ---------- Previous update was at 22:02 ----------

@scottn

I've found a 6 yr. old posting of Ygor here.
Now I'm almost able to "read" your code.
So "T" means Total and "$NF" means 3rd column, right?
I really can't understand how "$NF" stands for 3rd column....

Hi pseudocoder.

Yes, I'm sure this question has been asked and answered many many times.

The first awk you quote is my original one, which I changed, but which rich@ardz quoted.

It works fine (at least for me :))

$ awk '!(T+= $NF) END { print T/NR }' file1
1517.5

T is a running total, $NF is the value of the last field and NR is the number of rows.

---------- Post updated at 10:13 PM ---------- Previous update was at 10:10 PM ----------

Can you paste (i.e. not type) exactly the command you are running? This awk is not by any means complicated.

Thanks.

pseudocoder · April 22, 2010, 4:13pm

rich1 · April 22, 2010, 4:15pm

Got it, the following did it:

#!/bin/sh
#echo "please enter the name of the file you wish to analyse:"
#read filetoint
cd logs/db
ls -l logs/db
cat stats.txt
awk '{sum+=$3}END{print sum/NR}' stats.txt

amazing! cheers

Scrutinizer · April 22, 2010, 4:16pm

FWIW using Scottn's solutions on Ubuntu (using mawk) I get:

$ awk '!(T+= $NF) END { print T/NR }' infile
awk: line 1: syntax error at or near END

$ awk '{T+= $NF} END { print T/NR }' infile
1517.5

Scott · April 22, 2010, 4:19pm

Glad I changed it then

rich1 · April 23, 2010, 5:13pm

#!/bin/sh
#echo "please enter the name of the file you wish to analyse:"
#read filetoint
cd logs/db
ls -l logs/db
cat stats.txt
awk '{sum+=$3}END{print sum/NR}' stats.txt

I'd like to expand this to look at all text files (which have the same format) in the directory and run the awk alg. against them - I've tried a for loop but what do I replace the stats.txt part of the awk with?

---------- Post updated at 10:13 PM ---------- Previous update was at 09:56 AM ----------

need some advice on this script, hope someone can help:

#!/bin/sh

directory=test/logs/etc

cd $directory

FILES="*.txt"

for f in "$FILES"

do

    echo "stats at `date`:"
    echo "slowest response time from db was `cut -f9 -d" " $f | sort -n | tail -1` ms"
    echo "fastest response time from db was `cut -f9 -d" " $f | sort -n | head -1` ms"
    echo "average response time from db was `awk '{sum+=$9}END{print sum/NR}' $f` ms"

#every hour, output to file in format:
#DATE TIME - MIN - MAX - AVG

done

what I now need it to do is write to a file every hour in the following format:

DATE TIME - MIN - MAX - AVG

I also want it to write to a new file each day is that possible?

durden_tyler · April 23, 2010, 6:44pm

With the value of the iterator variable of your for loop.

$ 
$ # list all text file in current directory
$ ls -1 *.txt
file1.txt
file2.txt
file3.txt
file4.txt
$ 
$ # show their contents
$ perl -lne 'print $.==1 ? "\n== $ARGV ==\n$_" : $_; close ARGV if eof' *.txt

== file1.txt ==
abcd e544f222e0jk55 1503
abcd e544f222e0jk56 1504
abcd e544f222e0jk57 1505

== file2.txt ==
abcd e544f222e0jk58 1506
abcd e544f222e0jk59 1507
abcd e544f222e0jk60 1508

== file3.txt ==
abcd e544f222e0jk61 1509
abcd e544f222e0jk62 1510
abcd e544f222e0jk63 1511

== file4.txt ==
abcd e544f222e0jk64 1512
abcd e544f222e0jk65 1513
abcd e544f222e0jk66 1514
abcd e544f222e0jk67 1515
$ 
$ # display the contents of the shell script that loops through these
$ # text files and calculates the average of the 3rd field for each
$ # of them
$ 
$ cat -n script1.sh
     1    #!/bin/bash
     2    for f in *.txt; do
     3      avg=$(awk '{sum += $3} END {print sum/NR}' $f)
     4      echo "Average for file: $f = $avg"
     5    done
$ 
$ # run the shell script
$ . script1.sh
Average for file: file1.txt = 1504
Average for file: file2.txt = 1507
Average for file: file3.txt = 1510
Average for file: file4.txt = 1513.5
$ 
$

tyler_durden

aky_26 · August 4, 2010, 1:56am

awk '{T+= $NF} END { print T/NR }' text2

remove the ! and try it out.

ygemici · August 4, 2010, 7:20am

# echo "scale=1 ; $(echo {$(sed -e 's/[[:alnum:]][[:alnum:]]* [[:alnum:]][[:alnum:]]* //' -e '$! s/$/+/' infile)} | bc) /`sed -n '$=' infile`" | bc
1517.5