Average of elements throught multiple files

chillmaster · August 7, 2008, 9:23am

Hi,

I got a lot of files looking like this:

1
0.5
6

All together there are ard 1'000'000 lines in each of the ard 100 files.
I want to build the average for every line, and write the result to a new file.
The averaging should start at a specific line, here for example at line 380'000.

Help is much appreciated...

chillmaster

PS: This should be done for data files of CFD calculations, exactly to average a transient calculation and read it with postprocessor again.
So the first (here ard 380'000) lines are the coordinates of the field points following hereafter...

jim_mcnamara · August 7, 2008, 10:41am

Do you mean average accross hundreds of files - ie., line 474000 average from files 1...200? This would mean each file has the same number of lines, which based on your explanation does not seem to be the case.

chillmaster · August 7, 2008, 11:10am

Hi jim,

yep, you are rite.
Maybe i wrote a little confused, English isnt my mother tongue...

Anyway, rite now I got 100 files each with 1'050'000 rows, just one column.
Every file is a specific point of time.
I want an average for each line over all the 100 files (Line 1 for all 100 files, Line 2 ...).
However, I need to start averaging at a specific line, something ard 380'000.
These first 380'000 rows are coordinate points.

I think first I need to join the files, that I get one file with 1'050'000 rows and 100 columns. I tried:

 cut -f 1 data0050_1.ip data0055_1.ip | pr -2 -t > test.ip

just for two files, but it isnt working.
It seems that there is a limit for the lenght of the files...

pressure 2
velocity 23
. 345
. .
. .
2 .
3 pressure
6 .
5 .
64 .
64
3
6

Pressure is supposed to be the first row...

Thank you for your help

joeyg · August 7, 2008, 12:01pm

Could you (not in proper syntax)

while read filelisting
do
cat -n file >>bigfile
done

sort bigfile >sortedfile

you would then have sortedfile looking like
1 123.1222
.. ... (98 more)
1 1.509 (last)
2 56.789

Perhaps then awk could sum up all for field1 is the same

Two concerns:
(a) bigfile will be a REALLY long file
(c) will the next sort command die with the REALLY long file?

jim_mcnamara · August 7, 2008, 12:38pm

Great. I got it. Try something like this.
Assume your files have the same base name with numbers after them 1 - 100 or so and the files are called filename1... filename<nnn>

#!/bin/ksh

> outputfile
filecount=0
for file in filename*
do
	sed -n '380000, $p' $file >> outputfile
	filecount=$(( filecount + 1))
done
awk -v filecount=$filecount '
            BEGIN { i=1}
            { avg[ i % 670000]+=$1 }
     	END{for (i=1; i<= 670000; i++) { printf("%.3f\n", avg/filecount) }) ' outputfile > avgfile

The %.3f format specifier controls the precision of the output, this rounds to 1000ths

chillmaster · August 8, 2008, 3:57am

Hello jim,

thank you very much for your help.
The first part is working, the second part brings a syntax error at awk line 3.
Unfortunately, I dont know anything about awk.
I will check the syntax in the next few days, since it seems to me very powerful.
I would be very happy, if you could help me again anyway, because I think I need some time to get the awk usage...

One more question about sorting the files:
The first part makes a really big file, while I cant see at which position one file ends and the next is starting. Actually, you set an awk comand with the number 670'000, I think thats the file lenght of each seperate file in the big outpufile, is that rite?
Anyway, do you know, how i can join multiple one column files to one multiple column file. Seems to me not to be a really difficult task, however, I cant find any helpful information.

Thank you very much
chillmaster

chillmaster · August 8, 2008, 4:22am

Addition:

I found the syntax error, the last bracket was the wrong one... ) > }
Anyway, it is not working rite, avgfile:

70124752,000
0,000
0,000
0,000
0,000
0,000
0,000
0,000

Tried with just 5 files.

Tytalus · August 8, 2008, 8:53am

 perl -MIO::File -MList::Util=sum -e '@_=map{new IO::File$_,"r"}@ARGV;for(;;){print sum(map{exit unless <$_>}@_)/@_."\n"}' *

Not mine...from a trustable guru sitting near me who was interested......

As for the "from a certain line" his answer was "just tail the output - which is fair

Tytalus · August 8, 2008, 9:45am

a nawk solution:

 nawk '{x[FNR]+=$0;y[FNR]++}END{while(y[++i]){printf("%.3f\n",x/y)}}' *

again - tail the output ie | tail+380000

chillmaster · August 8, 2008, 11:25am

Hello,

I got following solution:

#!/bin/bash

filecount=0
for file in data*.ip
do
sed -i -n '382976, $p' $file
sed -i 's/[.]/,/g' $file
filecount=$(( filecount + 1))
echo $filecount
done

paste data*.ip > data.ip

cat data.ip | awk '{ printf("%.12f\n", ($1+$2+$3+$4+$5+$6+$7+$8+$9+$10+$11+$12+$13+$14+$15+$16+$17+$18+$19+$20+$21+$22+$23+$24+$25+$26+$27+$28+$29+$30+$31+$32+$33+$34+$35+$36+$37+$38+$39+$40+$41+$42+$43+$44+$45+$46+$47+$48+$49+$50+$51+$52+$53+$54+$55+$56+$57+$58+$59+$60+$61+$62+$63+$64+$65+$66+$67+$68+$69+$70+$71+$72+$73+$74+$75+$76+$77+$78+$79+$80+$81+$82+$83+$84+$85+$86+$87+$88+$89+$90+$91+$92+$93+$94+$95+$96+$97+$98+$99+$100)/100) }' > data.out

sed -i 's/[,]/./g' data.out

Its not really elegant, but it is working so far.
How can I do an easier average?

This is not really good as well:

sed -i -n '382976, $p' $file
sed -i 's/[.]/,/g' $file

How can i combine both? Tried it with the -e parameter, didnt work...
And how can I write to multiple seperate files, without the -i?
(data01.txt to data01_new.txt for example)

Isnt it possible to use parameters in sed?
Example:

#!/bin/bash

filecount=0
for file in data*.ip
do
sed -i -n '${1}, $p' $file
sed -i 's/[.]/,/g' $file
filecount=$(( filecount + 1))
echo $filecount
done

#!/bin/bash

row=${1}
filecount=0
for file in data*.ip
do
sed -i -n 'row, $p' $file
sed -i 's/[.]/,/g' $file
filecount=$(( filecount + 1))
echo $filecount
done

How can I cut row with sed from an defined row downwards?
And how can I paste rows after the cutting position?
(File1 has 1'000'000 row, I just want keep the first 300'000. File2 has 700'000 rows, I want paste these in File1 starting at position 300'001)

Thank you very much and a very nice weekend...
chillmaster

chillmaster · August 8, 2008, 11:28am

Hi Tytalus,

thanks for answer.
You posted while I was writing.

Looks really short, will check both asap and post a solution...

So long
chillmaster