AWK aggregate records

anaconga · August 26, 2008, 7:14am

Hy all,

I have a problem...can some one help me...

I have a file of records sort:
30|239|ORD|447702936929 |blackberry.net |20080728|141304|00000900|2|0000000000000536|28181|0000000006|0000000001|10|1

30|239|ORD|447702936929 |blackberry.net |20080728|141304|00000340|2|0000000000007076|29211|0000000024|0000000007|10|1

30|239|ORD|447702936929 |blackberry.net |20080728|141305|00000900|3|0000000000001568|28181|0000000002|0000000002|10|1

What i want to do is aggregate this records by the following conditions

In tha case that the record have the same key ($1, $4, $5) and the field $9 it's =2:

VAR= $7 + $8 + 2

Next i go to the next record and:
if field $7 <= VAR, then i aggregate this two records

else...next record...

the last record need to have the $9 = 3, because it�s indicate that my leg is closed.

i just can aggregate when i have a multiple records with $9=2 and the last record with $9=3.

the result of the example it's something like this:

30|239|ORD|447702936929 |blackberry.net |20080728|141304|00002140|1|0000000000000536|28181|0000000006|0000000001|10|1

When i aggregate the record, i sum the field $8 and set the $9 with 1.

otheus · August 27, 2008, 9:39am

I need a little more info. After aggregating, what do you want to print out? When there's no aggregation, do you want to print out anything?

Here's a start:

awk -F\| '
  VAR && $7 <= VAR { AGGREGATE HERE; VAR=0; next; }
  $9 == 2 && $1 == $4 && $1==$5 { VAR=$7+$8+2; next; }
  { PRINT OTHER LINES HERE }
'

anaconga · August 27, 2008, 10:22am

hello....

this is what i deed....

I don�t know if it�s the best idea...but works....

your sugestion do something like this?

BEGIN {
FS="|";
c1 = -1;
c4 = -1;
c5 = -1;
c8 = 0;
time = 0;
}
{
if (($1 == c1) && ($4 == c4) && ($5 == c5))
{
# accumulate value $8
if ($7 <= time)
{
c8 = c8 + $8
# updates variable time
time = $7 + $8 + 2
}
#write to output
if ($9 == 4)
{
print c1"|"c2"|"c3"|"c4"|"c5"|"c6 "|"c7 "|"c8"|1|"c10"|"c11"|"c12"|"c13"|"c14"|"c15 >> file_complete
close(file_complete)
#restart variables
time = 0
c8 = 0
next
}
}
else
{ # the record don't have the same key has the last record
if (c1 == -1)#when reads the 1st line of the file only keeps the relevant fields of the first record
{
if ($9 == 2) #save the fields from the beginning of the call, to put in output
{
c1 = $1
c2 = $2
c3 = $3
c4 = $4
c5 = $5
c6 = $6
c7 = $7
c8 = $8
c10 = $10
c11 = $11
c12 = $12
c13 = $13
c14 = $14
c15 = $15
time = $7 + $8 + 2
next
}
}
if (c1 != -1)#print output of the previous
{
print c1"|"c2"|"c3"|"c4"|"c5"|"c6"|"c7"|"c8"|1|"c10"|"c11"|"c12"|"c13"|"c14"|"c15 >> file_complete
close(file_complete)
# save the current record if is the beginning of the call
if ($9 == 2)
{ # save the fields from the beginning of the call, to put in output
c1 = $1
c2 = $2
c3 = $3
c4 = $4
c5 = $5
c6 = $6
c7 = $7
c8 = $8
c10 = $10
c11 = $11
c12 = $12
c13 = $13
c14 = $14
c15 = $15
time = $7 + $8 + 2
}
}
}
}
END {
#print the output for the case that not caught the last registration with closing ($ 9 = 4, but caught with $ 9 = 3)
if ($9 == 3)
{
print c1 "|" c2 "|" c3 "|" c4 "|" c5 "|" c6 "|" c7 "|" c8 "|1|" c10 "|" c11 "|" c12 "|" c13 "|" c14 "|" c15 >> file_complete
close(file_complete)
}
}

otheus · August 27, 2008, 10:38am

It looks like you understand what you're doing.

You can save yourself a lot of time by setting OFS to "|" and just doing print $0 instead of every variable independently. Also, while you can do it in one big awk program, its easier -- syntactically -- to break it up into multiple awk programs. Now, I don't mean multiple instances of awk. Every awk invocation can contain multiple programs, like this:

awk 'BEGIN { FS="/" }  <condition1> { program1... } <condition2> { program2}'

If condition1 matches, program1 will run. If condition2 matches, program2 will run (regardless of whether program1 ran or not). If you want program1 to stop processing and go to the next line, you use "next;". If you want program2 to get the next line now, you can do "getline;". (That might be GNU specific.)

Also, when you post on this forum, it helps to embed your code in [code] tags. It keeps the spacing.

anaconga · August 27, 2008, 10:42am

sorry....i�m new in this things.....

About your sugestion, it seams very good ....i will try split my AWK in another AWKs....

thanks...