awk script to find time difference between HTTP PUT and HTTP DELETE requests in access.log

Hi,

I'm trying to write a script to determine the time gap between HTTP PUT and HTTP DELETE requests in the HTTP Servers access log.

Normally client will do HTTP PUT to push content e.g. file_1.txt and 21 seconds later it will do HTTP DELETE, but sometimes the time varies causing some issues on the server side.

The format of the log is as below:

2016-07-06 11:09:04 [127.0.0.2] [PUT http://127.0.0.1:80/abc/bce/cde/file_30.txt HTTP/1.1] [201]
2016-07-06 11:09:04 [127.0.0.2] [DELETE http://127.0.0.1:80/abc/bce/cde/file_9.txt HTTP/1.1] [404]
2016-07-06 11:09:05 [127.0.0.2] [PUT http://127.0.0.1:80/abc/bce/cde/file_31.txt HTTP/1.1] [201]
2016-07-06 11:09:05 [127.0.0.2] [DELETE http://127.0.0.1:80/abc/bce/cde/file_10.txt HTTP/1.1] [404]
...
...
2016-07-06 11:09:25 [127.0.0.2] [PUT http://127.0.0.1:80/abc/bce/cde/file_51.txt HTTP/1.1] [201]
2016-07-06 11:09:25 [127.0.0.2] [DELETE http://127.0.0.1:80/abc/bce/cde/file_30.txt HTTP/1.1] [404]
2016-07-06 11:09:26 [127.0.0.2] [PUT http://127.0.0.1:80/abc/bce/cde/file_52.txt HTTP/1.1] [201]
2016-07-06 11:09:26 [127.0.0.2] [DELETE http://127.0.0.1:80/abc/bce/cde/file_31.txt HTTP/1.1] [404]

So from above I'd need to first find the time stamp e.g. when file_30.txt was pushed in, then find the HTTP DELETE for the file_30.txt and then determine the time difference. (file_30.txt just and example as this would need to be done to every file pushed in)

So it would be the 2 following lines
2016-07-06 11:09:04 [127.0.0.2] [PUT Revive Adserver HTTP/1.1] [201]
2016-07-06 11:09:25 [127.0.0.2] [DELETE Revive Adserver HTTP/1.1] [404]

So time difference would be: 11:09:25 - 11:09:04 = 21seconds

And output would be something like:
File: file_30.txt, difference: 21sec

I'm trying to use awk script to do this but I'm not very familiar with it so I've just started with the following(which might be totally stupid) so if anyone would have ideas on how to go about writing the script to achieve my goal it would be great:

BEGIN {
    FS=" ";
}

{
    if (/file_/)
    {
      time = $2
      httpmethod = $4
      file = $5
      sub(/.*\//,"",file)
      httpresult = $6
    }
	# Need to find the line with HTTP DELETE with the file from above
	# and then determine the time difference
}
[user@host ~]$ cat test.sh
#! /usr/bin/bash

file=$1
logfile="file"

in_dt=$(grep "PUT.*$file" $logfile | cut -d' ' -f1,2)
out_dt=$(grep "DELETE.*$file" $logfile | cut -d' ' -f1,2)

in_dt_s=$(date -d"$in_dt" +%s)
out_dt_s=$(date -d"$out_dt" +%s)

echo "File: $file; Difference: $(( out_dt_s - in_dt_s ))"
[user@host ~]$ ./test.sh file_30
File: file_30; Difference: 21
[user@host ~]$

Parameterize accordingly.

1 Like

If you have gnu awk available you could use mktime() like this:

gawk '
/PUT/||/DELETE/{
   file=$5;
   gsub(".*/", "", file)
   stamp=$1 " " $2
   gsub("[:-]", " ", stamp)
}
/PUT/ {
   w[file]=mktime(stamp)
}
/DELETE/ && (file in w) {
   print "File: " file ", difference: " mktime(stamp) - w[file] "sec"
   delete w[file]
}' infile

Output for demo file:

File: file_30.txt, difference: 21sec
File: file_31.txt, difference: 21sec
1 Like

Thanks balajesuri & Chubler_XL!! both solutions work perfectly :slight_smile: