Get the line from the log file

Hi All,

My requirement is to get the lines from the logfile which should updated in the last two hrs from the current time.

2014-May-05 03:24:07 525;WARN   ;error in line 1
at (Unknown Source)
	at mpl.java:37)
	at (Method.java:611)
	at (ApiHelper.java:395)
2014-May-05 03:24:09 780;WARN   ;system error in line 1
at (Unknown Source)
	at mpl.java:37)
	at (Method.java:611)
	at (ApiHelper.java:395)
2014-May-06 03:24:09 781;WARN   ;system error occuerd.
at (Unknown Source)
	at mpl.java:37)
	at (Method.java:611)
	at (ApiHelper.java:395)
2014-May-06 04:24:09 781;WARN   ;system error occuerd.
at (Unknown Source)
	at mpl.java:37)
	at (Method.java:611)
	at (ApiHelper.java:395)

if above is the sample file, current time is May'2014 5.00 AM then i want the lines which is having timestamp of last two hrs from the current time.
Please help me on this.

Date math is hard. If you have GNU date, then you can use this

mute@thedoctor:~$ ./script < input
2014-May-06 03:24:09 781;WARN   ;system error occuerd.
at (Unknown Source)
        at mpl.java:37)
        at (Method.java:611)
        at (ApiHelper.java:395)
2014-May-06 04:24:09 781;WARN   ;system error occuerd.
at (Unknown Source)
        at mpl.java:37)
        at (Method.java:611)
        at (ApiHelper.java:395)
mute@thedoctor:~$ cat script
#!/usr/bin/awk -f

BEGIN {
#       was used for testing with specific date/time
        now="May 6 2014 05:00:00"
        cmd=sprintf("date -d'%s 2 hours ago' +%%s", now)
#       cmd="date +%s"
        cmd | getline start
        close(cmd)
}

# if we found a line with a date within two hours,
# then the rest must be in range too
go==1 { print; next }

# reading a date line
$1 ~ /^[0-9][0-9][0-9][0-9]-/ {
        # reformat date from "2014-May-5" to "May 5 2014"
        split($1,a,/-/)
        cmd=sprintf("date -d'%s %s %s %s' +%%s", a[2], a[3], a[1], $2)
        cmd | getline secs
        close(cmd)
        if (secs >= start) {
                print
                go=1
        }
}
1 Like

If you don't have access to the GNU utilities version of the date utility, but you have a recent 1993 version of the Korn shell (such as the one on the last few releases of Mac OS X), you could use:

#!/bin/ksh
# Initialize start date (sd), start time (st), and end date (ed)
# Note that if this script can be run close to midnight, sd and st msut be set
# before ed.
read sd st <<-EOF
	$(printf '%(%Y-%b-%d %T)T' '2 hours ago')
EOF
ed=$(date '+%Y-%b-%d')
printf "Start date=%s, Start time=%s, End date=%s\n" $sd $st $ed

awk -v ed=$ed -v sd=$sd -v st=$st '
$1 ~ /^[0-9]{4}-[[:alpha:]]{3}-[0-3][0-9]$/ {
        # If the start date matches and the time in this line is later than the
        # start time, print this and later lines.  If sd and st are late
        # yesterday (after 10pm), also print lines from today.
	if(($1 == sd && $2 > st) || ($1 == ed && ed != sd))
		p = 1
	else	p = 0
}
p' logfile

If you don't have a recent version of ksh93 for start times shifted a few hours from the current time (like the 2 hours needed for this problem), you can use the POSIX method of specifying the time zone to shift the time, but the value you need to use depend on your current timezone.

For example, I am in the US Pacific timezone which can be specified by setting TZ=PST8PDT . To shift the output of date to report times 2 hours ago, add two to the number in TZ for your timezone (e.g., TZ=PST10PDT ). To be sure that you have the right value, verify that the command:

TZ=PST10PDT date '+%Y-%b-%d %T\n'

(with your setting for TZ ) prints the date and time two hours ago. Then you can change:

	$(printf '%(%Y-%b-%d %T)T\n' '2 hours ago')

in the above script to:

	$(TZ=PST10PDT date '+%Y-%b-%d %T')

(with your setting for TZ ) and the script should work with any version of the Korn shell or any other shell (such as bash) that recognizes basic POSIX shell syntax.

If you want to try this on a Solaris/SunOS system, also change awk to /usr/xpg4/bin/awk or /usr/xpg6/bin/awk .

2 Likes

Hi All,

Thanks for your replies :slight_smile:

I used Don Cragun's reply and i am just getting the below output

Start date=2014-May-06, Start time=22:15:30, End date=2014-May-07

not the lines from the logfile.Any things need to be added.Please help me.

Your sample input no longer has any data with a date and time that occurs in the last two hours.

Hi ,

Thanks much for your response.:)this is the sample data which i have ,

2014-May-06 23:17:35 347;WARN   ;RMI TCP Connection(13147)
2014-May-06 23:17:35 347;WARN   ;RMI TCP Connection(13147)
2014-May-06 23:36:00 612;WARN   ;RMI TCP Connection(13154)
2014-May-06 23:57:30 676;WARN   ;RMI TCP Connection(13158)
2014-May-06 23:57:30 676;WARN   ;RMI TCP Connection(13158)
2014-May-06 23:57:32 688;WARN   ;RMI TCP Connection(13158)
2014-May-07 00:57:32 689;WARN   ;RMI TCP Connection(13158)

ouptnow

Start date=2014-May-06, Start time=22:50:40, End date=2014-May-07

please correct me

Please show us the output from the commands:

uname -a
date

and, since I suggested several possible variations on the script; show us the exact script that you used, tell us what shell you used, and show us the exact command you used to invoke the script.

As I said before, I'm in the US Pacific timezone (where the output from date is now Tue May 6 23:32:04 PDT 2014 ). With your latest sample data, I'm getting:

Start date=2014-May-06, Start time=21:32:04, End date=2014-May-06
2014-May-06 23:17:35 347;WARN   ;RMI TCP Connection(13147)
2014-May-06 23:17:35 347;WARN   ;RMI TCP Connection(13147)
2014-May-06 23:36:00 612;WARN   ;RMI TCP Connection(13154)
2014-May-06 23:57:30 676;WARN   ;RMI TCP Connection(13158)
2014-May-06 23:57:30 676;WARN   ;RMI TCP Connection(13158)
2014-May-06 23:57:32 688;WARN   ;RMI TCP Connection(13158)

If I run it again in a half an hour (when the date here is May 7th), I'll get the above lines and the line:

2014-May-07 00:57:32 689;WARN   ;RMI TCP Connection(13158)

It is, however, interesting that you have data in your log file that should not have been written until seven minutes after you ran the script. (You showed a start time of 22:50:40 on May 6, so you ran the program at 00:50:40 on May 7???)

Thanks for the info Don. Is it documented somewhere?
I checked ksh93 manual and found :

However, I couldn't find if we can use GNU style text for date arithmetic (e.g. 2 hours ago )

Hi clx,
I agree that the documentation on the ksh printf %T format specifier is subpar. Through trial and error, I have found that the argument specifying the date and time can be formatted at least using the following formats:

  1. Output from date +%c with or without day-of-week, timezone, or year.
  2. Day-of-week
  3. last day-of-week
  4. next day-of-week
  5. n unit ago (where n is a positive integer value and unit is year , month , week , day , hour , minute , or second or the plural form of any of these). However, month and minute don't behave the way I expect them to.
  6. n unit ahead

You might also notice that the GNU date utility man page doesn't say much about the format of the arguments for its -d option either.

1 Like

Hi Don Cragun ,

Thanks much for your responses.:slight_smile:
Here is the details
uname -a 's output

Linux  SMP Mon Jul 1 17:58:32 EDT 2013 64 GNU/Linux

date command gives the below output.

Wed May  7 03:49:57 CDT 2014

the script am using is the below one

#!/bin/ksh
# Initialize start date (sd), start time (st), and end date (ed)
# Note that if this script can be run close to midnight, sd and st msut be set
# before ed.
#EST5EDT
read sd st <<-EOF
	$(TZ=CST8CDT date '+%Y-%b-%d %T')
EOF
ed=$(date '+%Y-%b-%d')
printf "Start date=%s, Start time=%s, End date=%s\n" $sd $st $ed

awk -v ed=$ed -v sd=$sd -v st=$st '
$1 ~ /^[0-9]{4}-[[:alpha:]]{3}-[0-3][0-9]$/ {
        # If the start date matches and the time in this line is later than the
        # start time, print this and later lines.  If sd and st are late
        # yesterday (after 10pm), also print lines from today.
	if(($1 == sd && $2 < st) || ($1 == ed && ed != sd))
		p = 1
	else	p = 0
}
p' log

when i try to run the script i ma getting the below output.

Start date=2014-May-07, Start time=01:50:44, End date=2014-May-07

sample input log

2014-May-06 23:57:30 676;WARN   ;RMI TCP Connection(13158);
2014-May-06 23:57:32 688;WARN   ;RMI TCP Connection(13158);
2014-May-07 01:57:32 689;WARN   ;RMI TCP Connection(13158);

Please correct me.

---------- Post updated at 06:07 AM ---------- Previous update was at 04:54 AM ----------

Hi Don,

I undestand the script and now i am able to run the script and getting the output.

One step ahead from this can i able to print the lines in between the timestamp, eventhough it does not start with timestamp.
Something like below,

2014-May-07 04:00:32 689;WARN   ;RMI TCP Connection(13158)
</Stack>
    </Error>
    <Error ErrorCode="java.sql.SQLRecoverableException"
        ErrorDescription="Error_description_not_available" ErrorRelatedMoreInfo="">
        <Attribute Name="ErrorCode" Value="java.sql.SQLRecoverableException"/>
        <Attribute Name="ErrorDescription" Value="Error_description_not_available"/>
        <Stack>com.yantra.yfc.util.YFCException
2014-May-07 04:10:32 689;WARN   ;RMI TCP Connection(13158)-10.68.154.33;                    ;Clearing cache. Number cached=0 

That is what the script does. When it sees a line starting with a date, it decides whether to print that line and the lines following it until it sees another line starting with a date.

What didn't you understand before? Why weren't you getting any output from the awk portion of the script?

Hi All,

Don thanks for your valuable inputs.

I am having the same requirement like mentioned in this thread.i.e to take the lines which are modified within 5 hrs.
Only thing is the timestamp is getting differed like below.

[5/11/14 6:38:40:748 CDT] 0000001a WSChannelFram A   The Transport Channel Service has started chain 
[5/11/14 6:38:40:836 CDT] 0000001b  The system discovered process (name: nodeagent, type
[5/11/14 6:43:40:598 CDT] 0000001d FfdcProvider  W com.ibm.ws.ffdc.impl.FfdcProvider logIncident  emitted on 7456371539064352.txt com.ibm.ws.wsgroup.bb.BBPostingMsg 65
[5/11/14 9:19:02:321 CDT] 00000026 SystemOut     O File Name: /customer_overrides.properties
[5/11/14 9:19:02:324 CDT] 00000026 SystemOut     O File Name: /customer_overrides.properties
[5/11/14 10:40:21:640 CDT] 00000026 SystemOut     O File Name: /customer_overrides.properties

So i modified the script like below,

sd=$(TZ=CST11CDT date '+%_m/%e/%y' | tr -d ' ')
st=$(TZ=CST11CDT date '+%H:%M:%S')
ed=$(date '+%_m/%e/%y'| tr -d ' ')
cut -c 2- | sed 's/:/ /3'awk -v ed=$ed -v sd=$sd -v st="21:49:52" '
{ if(($1 == sd && $2 < st) || ($1 == ed && ed != sd)) p=1; }p' test.txt

Please help me on this as am not getting proper output

There are several problems here:

  1. If there is ANY chance that your script could be run close to midnight, NEVER set the starting date and timestamp in separate calls to the date utility.
  2. Using tr to get rid of spaces at the start of the month and day fields in dates is perfectly reasonable, but relatively expensive. Using shell built-ins to handle this issue is much more efficient.
  3. The input to the cut command in this pipeline is standard input for the script (not your log file).
  4. Adding a leading "[" to the date strings is much more efficient than invoking a separate process to run cut to remove the 1st character from each line in your log files.
  5. sed 's/:/ /3'awk is not a valid sed command. I assume that you intended to have sed 's/:/ /3' | awk instead, but when doing string comparisons to compare timestamps, getting rid of the colon and the milliseconds won't affect the results.
  6. Your use of %H (and my use of %T ) present the hour as a two digit value with a leading "0" for times before 10am. Your sample input omits the leading "0" and you didn't make any adjustments to account for that. (Note that a leading "0" needs to be added to your input data (when needed); removing a leading "0" (if present) from ts will not correctly check for the desired times.)

If I correctly understand your new input file format, the following might do what you want:

#!/bin/ksh
# Initialize start date (sd), start time (st), and end date (ed).
# Note that if this script can be run close to midnight, sd and st msut be set
# before ed.  The read command will strip off leading spaces in any of the
# values and the command substitutions will strip off the trailing newlines
# provided by the date commands.  Spaces in the date format strings guarantee
# that sd1 and ed1 will be set to the 1- or 2-digit month and sd2 and ed2 will
# be set to the 1- or 2-digit day a "/" and the last two digits of the year.
read sd1 sd2 st ed1 ed2 <<-EOF
	$(TZ=CST11CDT date '+%_m %e/%y %T') $(date '+%_m %e/%y')
EOF
# Add leading "[" and separating "/" to ed and sd.
ed="[$ed1/$ed2"
sd="[$sd1/$sd2"
printf "Start date=%s, Start time=%s, End date=%s\n" $sd $st $ed

awk -v ed="$ed" -v sd="$sd" -v st="$st" '
{	# Normalize timestamp field to have a 2-digit leading zero-filled hour
	# before the first colon.
	t = substr($2, 2, 1) == ":" ? "0"$2 : $2
	if(($1 == sd && t > st) || ($1 == ed && ed != sd)) print
}' logfile2

HI i have a doubt on the below command,

TZ=CST7CDT date '+%Y-%b-%d %T'

This gives the time which is 1 hr before time from the current time.
May i know how to get the time which is past 15 mins .
i.e if current time is

2014-Oct-08 22:30:51

my expected output is

2014-Oct-08 22:15:51

Thanks for your help.

If:

TZ=CST6CDT date '+%Y-%b-%d %T'

gives you the current time:

TZ=CST6:15CDT date '+%Y-%b-%d %T'

should give you the time 15 minutes ago.

1 Like