Text Manipulation.

Icepick · February 22, 2008, 5:54am

Hi

I have only ever used awk and sed for basic requirements up until now.
I have had to break a log down for multiple purposes.
Using awk, sed and a date script. I am left with this:

(message id, time of msg attempt, message id, domain name[from senders address], time of msg completion)

1JRkPs-0008m8-Fd 7230901
1JRkPs-0008m8-Fd domain.com 7230902
1JRkPs-0008m8-Fd abc.com 7230961
1JRkaH-0009E0-VZ 7231546
1JRkaH-0009E0-VZ domain.co.uk 7231547
1JRl5D-000AMD-22 7229863
1JRl5D-000AMD-22 123.com 7229864
1JRl66-000AOR-AZ 7229918
1JRl66-000AOR-AZ xyz.co.za 7229919

What this represents is email logs for sending/receiving.
The first entry is MSG id and time of attempt in seconds.
Second entry is MSG id , recipient and time of msg completion in seconds.

I am attempting to subtract the 2nd entry's' time with the first entry, and if there is a 3rd entry, do the same and if there is a 4th and so on and display the output like:

1JRkPs-0008m8-Fd
1JRkPs-0008m8-Fd domain.com (1s) abc.com (60s)
1JRkaH-0009E0-VZ
1JRkaH-0009E0-VZ domain.co.uk (1s)

My attempt is to work out delays from which domains are longer than others.
So in my example above, domain.com message was queued for 1 second, where as abc.com was queued for 60 seconds.

I wrote a script below which can perform the re-arranging, however not the calculations. Could someone please be of some assistance?

Many thanks in advance.

#!/usr/bin/awk -f
BEGIN {
       KEY=""
       DATA=""
}
{
       if($1 != KEY){
               if(KEY!=""){
                       printf(" %s %s\n",
 KEY, DATA)
               }
               DATA=$2
       } else {
               DATA=DATA" "$2
       }
       KEY=$1
}
END {
       printf(" %s %s\n", KEY, DATA)
}

Icepick · February 22, 2008, 6:36am

So far I have managed to get the output looking like this:

1JRk2I-0007wT-Dy 7229440 domain.com 7229440 abc.com 7230019
1JRkPs-0008m8-Fd 7230901 xyz.com 7230902 domain.co.uk 7230961
1JRkaH-0009E0-VZ 7231546 test.com 7231547
1JRl5D-000AMD-22 7229863 test.co.uk 7229864

By updating the script to look like this:

#!/usr/bin/awk -f
BEGIN {
       KEY=""
       DATA=""
}
{
       if($1 != KEY){
               if(KEY!=""){
                       printf(" %s %s\n",
 KEY, DATA)
               }
               DATA=$2" "$3
       } else {
               DATA=DATA" "$2" "$3
       }
       KEY=$1
}
END {
       printf(" %s %s\n", KEY, DATA)
}

Im still stuck on the calculations.

Any advice would be awesome.

Thanks.

fpmurphy · February 22, 2008, 8:56am

Try the following:

#!/usr/bin/awk -f

BEGIN {
   msg_id=""
}

{
   if ($1 != msg_id) {
      if (msg_id != "")
          print outstr
      msg_id=$1; stime=$2
      outstr=" "$1" "$2" "$3
   } else
      outstr=outstr" "$2" ("$3-stime"s) "
}

END {
   print outstr
}

which gives the following output from the sample data you supplied

 1JRkPs-0008m8-Fd 7230901  domain.com (1s)  abc.com (60s)
 1JRkaH-0009E0-VZ 7231546  domain.co.uk (1s)
 1JRl5D-000AMD-22 7229863  123.com (1s)
 1JRl66-000AOR-AZ 7229918  xyz.co.za (1s)

Klashxx · February 22, 2008, 9:03am

Another awk solution:

awk '{
keys[$1]++
time[$1,keys[$1]]=$NF
if ( NF > 2 )
   dat[$1,keys[$1]]=$2
}
END {
for ( it in keys )
   {
   for (i=1;i<=keys[it];i++)
       if ( i == 1 )
             printf("%s\n%s",it,it)
       else
          printf(" %s (%ds) ",dat[it,i],time[it,i]-time[it,1])
      printf("\n")
   } 
}' log_file
1JRkPs-0008m8-Fd
1JRkPs-0008m8-Fd domain.com (1s) abc.com (60s) 
1JRkaH-0009E0-VZ
1JRkaH-0009E0-VZ domain.co.uk (1s) 
1JRl66-000AOR-AZ
1JRl66-000AOR-AZ xyz.co.za (1s) 
1JRl5D-000AMD-22
1JRl5D-000AMD-22 123.com (1s)

Icepick · February 25, 2008, 3:18am

Thanks to the both of ya.