Calculating time windows from logfile timestamps

danowar · July 11, 2012, 7:58pm

Hey all. I am working on some scripts in bash to perform a variety of functions; there are a variety of steps involved, and they must happen in a specific sequence; what I need help with is a way to calculate some differences in a timestamp in a logfile.
One of the steps in the scripts I am writing involves issuing a command to an application that executes a 'deployment' process of sorts; the shell interface to this application basically receives the request to start this deployment process, and exits. The deployment process can take a wildly variable amount of time (a few minutes, up to a few hours); there are additional actions that my script needs to perform once that process is complete, but these actions cannot begin until it has.

The application that is performing this deployment process writes to a logfile, and I know the entry in the logfile that indicates that this deployment process has finished; however, this logfile is written to consistently, and I cannot clear it. What I need to do is to identify, within my script that initiates the deployment process, the time that the command is executed, and then search through the application's logfile for the completion condition, and compare the timestamps of those messages for the most recent one to occur after the time noted by my script when the action was begun. I can do most of this already; where I'm getting stuck is in parsing the timestamps into a useful, computable format. I can control the way in which my script sets its initial timestamp, but I cannot control the format in which the logfile marks its timestamps, which are written thusly (this entry is the completion condition that I am looking for):

Jul 10, 2012 7:47:45 PM] Application deployment complete.

The following date command will produce a timestamp formatted in exactly this fashion, but I don't know if that's actually useful or not:

date +%b\ %-d\,\ %Y\ %-l\:%M\:%S\ %p

I can certainly run the timestamp through a set of sed steps to parse out the individual pieces of information in the logfile, but I'm afraid that what I have thus far is, in addition being obviously cumbersome and probably quite amateurish, potentially unproductive and not really the right way to go about this:

#!/bin/bash
# Sets startDate variable with current timestamp in logfile's format, removes unnecessary characters, and converts to underscore delimited format
export startDate=`date +%b\ %-d\,\ %Y\ %-l\:%M\:%S\ %p | sed 's/ /_/g' | sed 's/,//g' | sed 's/:/_/g'`

# Sets each field in timestamp to individual variables using cut
export startMonth=`echo $startDate | cut -d\_ -f1`
export startDay=`echo $startDate | cut -d\_ -f2`
export startYear=`echo $startDate | cut -d\_ -f3`
export startAMPM=`echo $startDate | cut -d\_ -f7`
export startHour=`echo $startDate | cut -d\_ -f4`
# Converts 12-hour time to 24-hour time
if [ "$startAMPM" == PM];
    then export startHour=$(($startHour + 12))
fi
export startMin=`echo $startDate | cut -d\_ -f5`
export startSec=`echo $startDate | cut -d\_ -f6`

If I echo each of these variables individually at the end of the script, what I get when running it is this:

Jul 11 2012 19 53 22

I can do all this same logic on the entry in the application's logfile as well (replacing the date command with for instance, :

cat /path/to/logfile | grep "Application deployment complete"

-- but I guess my big question here is, then what?

How can I actually use that information to look for the right entry?
Also, is there a better way to parse out that information?

Thanks very much everyone, I appreciate the help.

methyl · July 11, 2012, 8:34pm

Please post what Oparating System and version you are running and what Shell you prefer.

For anything to do with date arithmetic, please post whether you have the GNU date command and/or a modern version of perl .
Please also post sample data for a date from last week (i.e. with a single digit day).

Chubler_XL · July 11, 2012, 11:17pm

Can you take note of how many lines are in the logfile when the process starts and then look for the "Application deployment complete" message on a line greater than this?

danowar · July 12, 2012, 12:07am

Hi Methyl, thanks very much. Operating system is Red Hat Enterprise Linux Server release 5.5; I prefer the bash shell.

I do appear to have the GNU date command: the bottom of the date manpage shows it as 'date 5.97', and is dated February 2010.

A sample the log file in question:

cat application-logfile.log | grep "Application deployment complete."
[Jun 20, 2012 3:57:57 PM]: Application deployment complete.
[Jun 27, 2012 12:28:14 PM]: Application deployment complete.
[Jun 28, 2012 10:38:10 PM]: Application deployment complete.
[Jul 6, 2012 7:35:38 PM]: Application deployment complete.
[Jul 10, 2012 7:47:45 PM]: Application deployment complete.

(Single-digit days are not zero-padded, which is why I used %-d in my test date command).

Regarding Perl, if it is included by default in the RHEL 5.5 Server release, then I probably do; I really don't know anything at all about Perl, and sadly have tended to avoid it accordingly. I'm sure it's much easier to do all this with a Perl script, but that's a whole other can of worms I'm hesitant to open. =\

---------- Post updated at 11:07 PM ---------- Previous update was at 11:03 PM ----------

I theoretically could, but I think that's a less reliable way to work in as a condition -- it is possible for a deployment activity here to be triggered manually outside the context of the actions my script is taking, which wouldn't necessarily interfere with the script's actions, but would mess up the deployment 'instance' counting if I did it this way, particularly considering the variable length and history of the application's log. Good idea, just impractical in this particular scenario -- I appreciate it, though!