The first field (timestamp) is growing (or at least equal).
1)Sum the second fields if the first_field/500 are equals.
2)Sum the second fields if the difference between first fields is less than 500.
(sliding window)
In the example presented.
1) Becouse 12345678/500 and 12345989/500 both result 24691 sum=4+13
We cannot group the 3rd line so sum=205
And we group the 4th and 5th line so sum=74+22
2) We group the 1st and 2nd line becouse 12345989 - 12345678 < 500
For analogy we group the 2nd and 3th, the 3rd and 4th,
and the 3rd,4th and 5th becouse 12346819 (of the 5th line) - 12346356 (of the 3th line) < 500
Is it just 2 (1st & 2nd, 2nd & 3rd, 3rd & 4th, ...) ?
Or is it 3 (1st, 2nd & 3rd; 2nd, 3rd & 4th; ...) ?
Hopefully, it's not a cartesian product, i.e.
1st vs. (2nd, 3rd, 4th, ... , last_row)
2nd vs. (1st, 3rd, 4th, ... , last_row)
3rd vs. (1st, 2nd, 4th, ... , last_row)
...
last_row vs. (1st, 2nd, 3rd, ..., last-1_row)
Ok, and what do you want to do with the sum ?
Do you want to display it ? Or do nothing with it (highly unlikely) ?
If you want to display it, then how ? The total against each row ? Or the total against the first row only ? Or against the second row only ?
This begs the first counter-question. Why compare the 3rd, 4th and 5th (considering that you have been comparing two-at-a-time all this while) ?
So again, what's the length of the sliding window ?
I guess a very simple example of your input file should help here. So, let's say your input file is as follows:
awk '{ sum1[int($1/500)]+=$2 } END { for (i in sum1) print "Sum1 "sum1 } ' infile
Case2:
awk 'BEGIN{
min=1
}
{ time[NR]=$1
val[NR]=sum2[NR]=$2
i=min
while (time[NR]-time>=500)
i++
min=i
for (i=min;i<NR;i++)
sum2[NR]+=val
}
END {
for (i in sum2)
print "Sum2: "sum2
}' infile
Case1+2 combined:
awk 'BEGIN{
min=1
}
{ sum1[int($1/500)]+=$2
time[NR]=$1
val[NR]=sum2[NR]=$2
i=min
while (time[NR]-time>=500)
i++
min=i
for (i=min;i<NR;i++)
sum2[NR]+=val
}
END {
for (i in sum1)
print "Sum1 "sum1
print ""
for (i in sum2)
print "Sum2: "sum2
}' infile
First of all THANK YOU VERY MUCH to durden_tyler and to Scrutinizer.
I need some time to elaborate(and study) your examples, and in some
days (if you want) I' ll came back with some other questions (but I realise
I need to REALLY study some manuals on awk/Perl) to obtain the
data filtering objects on which I'm working now.
Thanks again.
---------- Post updated 10-12-09 at 04:41 PM ---------- Previous update was 10-11-09 at 09:01 PM ----------
---------- Post updated at 08:04 PM ---------- Previous update was at 08:02 PM ----------
$ gunzip -c ES_05_10_Oct2009.gz |sed -e 's/BEST /BEST/g' -e 's/[:.]//g'|awk '{ sum1[int($1/500)]+=$4 } END { for (i in sum1) print "i="i"Su
m1 "sum1 [i]}'|tee outfile.txt
awk: (FILENAME=- FNR=4094608) fatal: format_tree: obuf: can't allocate 512 bytes of memory (Cannot allocate memory)
Is there a way to do the sum step by step avoiding an array structure?
(problem with very big data structures)
Thanks
Paolo
[COLOR="\#738fbf"]
---------- Post updated at 08:01 PM ---------- Previous update was at 04:41 PM ----------