Extract lines if string found from last 30 min only

Hi guys,

Appreciate your help as I am stuck with searching the logs for last 30 minutes from the current time. Current time is time when you execute the script and it will search for <string> through the logs for last 30 minutes only and if <string> found then print those lines only.

The logfile has 2 different dates as shown below but searching should limit to the lines which are

  • (1) Scanning should started with syntax <Feb 12,----date----PM UTC> as shown below and
  • (2) Scanning should avoid the scanning of lines (2019-02-12T12:26:59.842+0000: 45.152:)

I tried various awk and sed option but unable to scan the logs for last 30min. Using grep <string>, it does the scanning for <string>, pull all lines even from previous day as per string pattern match but I want to restrict the search string and print logs for last 30 min only if the strings match exist else no data to be returned.

logfile has below entries :

<Feb 12, 2019, 12:26:54,974 PM UTC> <Notice> <Security> <BEA-090082> <Security initializing using security realm myrealm.>
<Feb 12, 2019, 12:26:55,687 PM UTC> <Warning> <RMI> <BEA-080099> <RMIDiagnosticUtil.startObserver scheduled diag TimerTask.>
2019-02-12T12:26:59.842+0000: 45.152: [GC [PSYoungGen: 804554K->82927K(822784K)] 906587K->210120K(2627584K), 0.1191540 secs] [Times: user=0.41 sys=0.08, real=0.12 secs]
<Feb 12, 2019, 12:27:02,40 PM UTC> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to STANDBY>
--------------------------------------------------------------------------------------

Please provide information on your attempts to resolve.

Until so, we will refrain from sharing any guidance.

The purpose of this Board is the assist users in solving their problems. We are not a coding service. Further, we are not a homework service - and sometimes posts appear to be an attempt to have someone solve a school assignment.

Finally, we urge all members of the Forum to NOT post a solution to this question until effort to resolve is demonstrated.

1 Like

Welcome to the forum.

We like and probably are able to provide help to further you from and beyond the point(s) where you're stuck. So please show us the "various awk and sed option"s you tried, and also indicate where and how they failed. Be aware that the date format of the lines you target is way more difficult to track than the one of the lines you want avoided. Does your scan need to cross midnight? Are the log entries ascending in time? Are the to-be-avoided lines interspersed regularly? By the minute?

1 Like
#!/bin/bash

to=`date +"<%b%_d, %Y,%l:%M:%S,%3N %p %Z>"`
let from_in_seconds=`date +%s`-5000
from=`date -d @$from_in_seconds +"<%b%_d, %Y,%l:%M:%S,%3N %p %Z>"`
awk '$0>=from && $0<=to' from="$from" to="$to" file.log
 

Below string matches the date format that I have it in logs but awk not working

$date +"<%b%_d, %Y,%l:%M:%S,%3N %p %Z>"
 <Feb12, 2019, 1:36:55,448 PM UTC>

below string provide the dates past 30 minutes but while using it in awk it won;t work.

date --date='30 minutes ago' -u '+%b%_d, %Y, %T,%3N %p %Z'
 Feb12, 2019, 13:13:03,306 PM UTC

--- Post updated at 01:50 PM ---

yes, <data> keep logging from the application regularly day and night in ascending order. Although, I am least concerned about the avoiding the date format (2019-02-12T12:26:59.842+0000: 45.152) but would like to have lines pulled when string matches that are from last 30 minutes only which has below date format.
<Feb 12, 2019, 12:26:54,974 PM UTC>

Hmmm - I'm a bit surprised that Feb12, 2019, 13:13:03,306 PM UTC should be considered a valid time stamp (whereas 12:27:02,40 PM is). And, of course, Feb12 will never match Feb 12 in your log files.
It would be nice if your input sample would stretch across crucial points in time like midnight or 13:00h i.e. 1 PM.

Could you answer the remaining questions as well?

2 Likes
#!/bin/bash

NOW=$(`date +%s`)
last=$(( $NOW - 30*60 )) # last 30 minute
while read mth dy hhmmss A9 ; do

curr-time=$(date --date "${mth} ${dy} ${hhmmss}" '+%s')
if [[ "$curr-time" -ge "$last" ]] ; then
echo "${mth} ${dy} ${hhmmss} ${A9}"
fi
done < log.out

tried to use epoch (%s) but not able to use the date format which i have it in log file and +%s as it was giving the invalid date error :

-bash: curr-time=: command not found
- locked <0x000000050ef88b10> (a java.lang.Object)
1 Like

Errors may occur with this key.
%_d
better try to change the format in the log and use %d
If the information is collected in a several days

--- Post updated at 14:28 ---

awk -v d="$(LANG=C date -d -30minutes -u +"%b %_d, %Y, %T,%3N %p %Z")" -F "<|>" '($2 > d) {print}' file

--- Post updated at 14:33 ---

some nanoseconds in the log have 2 digits?

--- Post updated at 15:03 ---

cut off nanoseconds

awk -v d="$(LANG=C date -d -30minutes -u +"%b %_d, %Y, %T")" -F "<|>" '
(gensub(/,[^,]*$/, "", 1, $2) > d)      {print}
' file

--- Post updated at 15:17 ---

may be PM and UTC need to save?
than:

date -d -30minutes -u +"%b %_d, %Y, %T %p %Z"
gensub(/(,[0-9]+ )([^,]*)$/, " \\2", 1, $2)
2 Likes

Corrected format. Translated days into a two-digit number, hours from 12 to 24 hour format and removed nanoseconds and all that at the end

awk -F "<|>| |, |," -v d="$(LANG=C date -d -30minutes -u +" %b %d %Y %T")" '
 /^</           { line = $0
                if ( length($3) < 2 ) $3 = "0" $3
                split($5, a, ":" s)
                if ($7 == "PM" && a[1] != 12) $5 = (a[1]+=12) ":" a[2] ":" a[3]
                NF = 5
                }
(d < $0)        { print line }
' file.log

formats of compared values

$0 = Feb 02 2019 14:26:54
 d = Feb 12 2019 18:47:48

Any time you're trying to compare dates as strings you're doomed to failure if your strings contain a year that is not in the high order position, a month that is an abbreviated English month name instead of a month number, and/or days of month that are sometimes one digit and sometimes two digits. You need to be comparing date strings that in the same format and contain the same number of characters (unless you're going to convert everything to Seconds since the Epoch and perform a numeric comparison). The optimum string comparison format until the year 10000 is: YYYYmmddHHMMSS . You could try adding milliseconds to the end of that if you want to, but I don't think GNU date will give you anything other than 0 for milliseconds if you ask it to give you a date and time that is 1800 seconds ago. (And, if you tell it to give you a date and time that is 30 minutes ago, it will probably also give you 0 for the seconds part of your timestamp.

Note that I'm guessing on that, I don't have access to a GNU date utility. I do have access to a ksh version 93u+ which has a printf statement of the form:

printf "%(GNU_date_format_string)T\n" '1800 seconds ago'

that will give me date and time strings from 30 minutes ago (where GNU_date_format_string is a GNU date format string without the leading <plus-sign> character.

The following script seems to do what you want using the Korn shell on macOS Mojave version 10.14.3 to create a test log file with timestamps from 1900 seconds ago up to 1700 seconds ago in 15 second intervals to verify that it is converting dates so it starts printing records from the log file that are no more than 30 minutes old. If you comment out the printf statements that are printing dates and uncomment the date commands that are currently commented out, this code should work with either bash or ksh on a Linux system with a GNU date utility installed.

If you invoke this script with an argument (any argument), the awk script will print out debugging information showing how the split() function split up the lines in the date format you want to process until it finds a timestamp that meets your criteria.

#!/bin/ksh
# Create sample logfile for this test creating entries with two different date
# and time formats.
for ((i=1900; i>1700; i-=15))
do	TZ=UCT0 LC_ALL=C printf \
	    "%(<%b %e, %Y, %I:%M:%S,%3N %p %Z> <$i seconds ago>)T\n" \
	    "$i seconds ago"
#	LC_ALL=C date -u --date "$i seconds ago" \
#	    "+<%b %e, %Y, %I:%M:%S,%3N %p %Z> <$i seconds ago>" 
	    
	TZ=UCT0 LC_ALL=C printf \
	    "%(%Y-%m-%dT%H:%M:%S.%3N+0000: $i seconds ago)T\n" \
	    "$i seconds ago"
#	LC_ALL=C date -u --date "$i seconds ago" \
#	    "+%Y-%m-%dT%H:%M:%S.%3N+0000: $i seconds ago" 
done > logfile 
printf 'Using logfile containing:\n'
cat logfile

printf '\nstarting awk at about '
TZ=UCT0 LC_ALL=C printf '%(%Y-%m-%d %H:%M:%S,%3N)T\n'
#LC_ALL=C date -u '%Y-%m-%d %H:%M:%S,%3N'

start_date=$(TZ=UCT0 LC_ALL=C printf '%(%Y%m%d%H%M%S)T' '1800 seconds ago') 
#start_date=$(LC_ALL=C date -u '+%Y%m%d%H%M%S' --date '1800 seconds ago')
printf 'start_date=%s\n' "$start_date";date -u

awk -v start="$start_date" -v Log=$# '
BEGIN {	split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", m2b)
	for(i = 1; i <= 12; i++)
		b2m[m2b] = sprintf("%02d", i)
}
{	if($1 ~ /</) {
		if(print_it) {
			print
			next
		}
	} else	next
	split($0, fields, /[<> ,:]+/)
	if(Log) for(i=1; i<=12; i++) printf("fields[%d]=%s\n",i,fields)
	if(fields[5] == 12)
		fields[5] = "00"
	if(fields[9] == "PM")
		fields[5] += 12
	linedate = fields[4] b2m[fields[2]] sprintf("%02d", fields[3]) \
	    fields[5] fields[6] fields[7]
	if(Log)printf("linedate:%s from %s\n", linedate, substr($0,1,45))
	if(linedate >= start) {
		print_it = 1
		print
	}
}' logfile

Running this script a few minutes ago produced the following output:

Using logfile containing:
<Feb 12, 2019, 11:48:31,000 PM GMT> <1900 seconds ago>
2019-02-12T23:48:31.000+0000: 1900 seconds ago
<Feb 12, 2019, 11:48:46,000 PM GMT> <1885 seconds ago>
2019-02-12T23:48:46.000+0000: 1885 seconds ago
<Feb 12, 2019, 11:49:01,000 PM GMT> <1870 seconds ago>
2019-02-12T23:49:01.000+0000: 1870 seconds ago
<Feb 12, 2019, 11:49:16,000 PM GMT> <1855 seconds ago>
2019-02-12T23:49:16.000+0000: 1855 seconds ago
<Feb 12, 2019, 11:49:31,000 PM GMT> <1840 seconds ago>
2019-02-12T23:49:31.000+0000: 1840 seconds ago
<Feb 12, 2019, 11:49:46,000 PM GMT> <1825 seconds ago>
2019-02-12T23:49:46.000+0000: 1825 seconds ago
<Feb 12, 2019, 11:50:01,000 PM GMT> <1810 seconds ago>
2019-02-12T23:50:01.000+0000: 1810 seconds ago
<Feb 12, 2019, 11:50:16,000 PM GMT> <1795 seconds ago>
2019-02-12T23:50:16.000+0000: 1795 seconds ago
<Feb 12, 2019, 11:50:31,000 PM GMT> <1780 seconds ago>
2019-02-12T23:50:31.000+0000: 1780 seconds ago
<Feb 12, 2019, 11:50:46,000 PM GMT> <1765 seconds ago>
2019-02-12T23:50:46.000+0000: 1765 seconds ago
<Feb 12, 2019, 11:51:01,000 PM GMT> <1750 seconds ago>
2019-02-12T23:51:01.000+0000: 1750 seconds ago
<Feb 12, 2019, 11:51:16,000 PM GMT> <1735 seconds ago>
2019-02-12T23:51:16.000+0000: 1735 seconds ago
<Feb 12, 2019, 11:51:31,000 PM GMT> <1720 seconds ago>
2019-02-12T23:51:31.000+0000: 1720 seconds ago
<Feb 12, 2019, 11:51:46,000 PM GMT> <1705 seconds ago>
2019-02-12T23:51:46.000+0000: 1705 seconds ago

starting awk at about 2019-02-13 00:20:11,614
start_date=20190212235011
Wed Feb 13 00:20:11 UTC 2019
<Feb 12, 2019, 11:50:16,000 PM GMT> <1795 seconds ago>
<Feb 12, 2019, 11:50:31,000 PM GMT> <1780 seconds ago>
<Feb 12, 2019, 11:50:46,000 PM GMT> <1765 seconds ago>
<Feb 12, 2019, 11:51:01,000 PM GMT> <1750 seconds ago>
<Feb 12, 2019, 11:51:16,000 PM GMT> <1735 seconds ago>
<Feb 12, 2019, 11:51:31,000 PM GMT> <1720 seconds ago>
<Feb 12, 2019, 11:51:46,000 PM GMT> <1705 seconds ago>

Maybe this will give you something you can build on.

1 Like

Based on Don Crugun's comments, I'll just fix my script. Thanks

awk -F "<|>| |, |," -v d="$(LANG=C date -d -30minutes -u +"%Y%m%d%T")" '
BEGIN   { split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", m2b)
        for(i = 1; i <= 12; i++)
        b2m[m2b] = sprintf("%02d", i)
}
/^</    { line=$0
        if ( length($3) < 2 ) $3 = "0" $3
        split($5, a, ":" s)
        if ($5 == 24) $5 = "00"
        if ($7 == "PM") $5 = (a[1]+=12) ":" a[2] ":" a[3]
        $0 = $4 b2m[$2] $3 $5
        if ( d < $0 ) print line
}
' file

formats of compared
2019021212:26:55

As an alternative to the string constant holding 12 English month names, you could try

awk -v"abmon=$(locale abmon)" 'BEGIN {for (n=split(abmon, MTH, ";"); n;n--) NumMTH[MTH[n]]=n} ... '
1 Like

Hi nezabudka,
Note that although the above code will work in many cases, there are a few issues that will cause it to fail intermittently:
First, the command in the command substitution:

LANG=C date -d -30minutes -u +"%Y%m%d%T"

You may have noticed that when I used a similar construct in the code I suggested in post #9 (correctly in all the ksh93 printf calls and sometimes correctly in the GNU date invocations [all have now been fixed]) that I used LC_ALL=C instead of LANG=C . These environment variables (along with other LC_* variables for the various locale categories have a hierarchy that determines which variable controls the operation when more than one of them are found in the environment. For example, if I run the command RudiC mentioned in post #11 to get a locale's abbreviated month names with the three variables that control the strings used to define a locale's month names all set to different values: LC_ALL=ru_RU specifying a Russian locale for all locale categories no matter what other locale variables are set, LC_TIME=it_IT specifying an Italian locale for time related strings defined by the standards, and LANG=C specifying the locale to be used if none of the other locale environment variables are set, we see that if LC_ALL is defined on the command line (or in your environment) it overrides all of the other locale variables:

LC_ALL=ru_RU LC_TIME=it_IT LANG=C locale abmon
�нв;фев;мар;апр;май;июн;июл;авг;�ен;окт;но�;дек

which gives us the abbreviated month names in Russian. If we drop the setting for LC_ALL (and don't have LC_ALL set in the environment), the command:

LC_TIME=it_IT LANG=C locale abmon
Gen;Feb;Mar;Apr;Mag;Giu;Lug;Ago;Set;Ott;Nov;Dic

which gives us Italian abbreviated month names. So, if you want to want to guarantee that the date utility will English names for things like "minutes" and "seconds" when using date -d time_base or date --date time_base , you need to use LC_ALL=C or LC_ALL=POSIX instead of LANG=C or LANG=POSIX . Note that I don't have a GNU date utility installed on my system and I don't know which locale category it uses to match the time period strings in -d option-arguments. I would guess that they are controlled by LC_TIME, but they could also be controlled by LC_MESSAGES. Either way, setting LC_ALL will override it and give you what you want.

Second, in the awk statement:

split($5, a, ":" s)

you only get the results you want because the variable s is not defined in your script. To reduce confusion and protect against a user invoking your awk script with a defined s variable, change the last argument in that function call to just ":" instead of ":" s .

Third, the expression in the awk if statement:

if ($5 == 24) $5 = "00"

can't ever yield a true result. In this script, $5 on the lines you're processing will always be of the form hh:mm:ss,sss where hh is the hour in 12-hour clock format (01-12), mm is the minute (00-59), and ss,sss is the seconds (00-60) and subseconds apparently consisting of 1 to 3 decimal digits representing tenths, hundredths, or thousandths of a second. There is no way that a string representing a clock for the current time in the above form will ever be the string 24 , nor even start with that string. Presumably you want to determine if the hour portion of the time field is 12 and, if it is, reset it to 00 (which will be the correct 24-hour clock hour field if the AM/PM indicator is AM and will later be incremented back to 12 if the AM/PM indicator is PM . I would guess that you would get what you had intended to do if you change the two lines in your code:

        if ($5 == 24) $5 = "00"
        if ($7 == "PM") $5 = (a[1]+=12) ":" a[2] ":" a[3]

to:

        if (a[1] == 12) $5 = (a[1] = "00") ":" a[2] ":" a[3]
        if ($7 == "PM") $5 = (a[1]+=12) ":" a[2] ":" a[3]

or to:

        if (a[1] == 12 && $7 == "AM") $5 = "00:" a[2] ":" a[3]
        if (a[1] < 12 && $7 == "PM") $5 = (a[1] + 12) ":" a[2] ":" a[3]

If you run into issues similar to these in the future, I hope these comments will help you understand some of the pitfalls you have to watch out for when writing code to deal with various date and time formats.

Cheers,
Don

2 Likes

Thank you very much for the comments. All the above I be taken into account for the future.
And in the last remark. This is my carelessness and bug. The order of the expressions was violated.
Apparently I wanted to make something like that.

        if ($7 == "PM") a[1]+=12
        if (a[1] == 24) a[1] = "00"
        $5 = a[1] ":" a[2] ":" a[3]

Thanks to @RudiC. There are no options in the man page on this issue:

locale abday
locale abmon

Thank you for teaching, it was very informative.

Aren't there?

1 Like

Hi nezabudka,
I'm afraid the above code still doesn't work for anything that started with a[1]==12 . If you start with 12 AM on a 12 hour clock you should end up with hour 00 on a 24 hour clock (the above code ends up with hour 12) and if you start with 12 PM on a 12 hour clock you should end up with hour 12 on a 24 hour clock (the above code ends up with hour 00).

If you don't like the code I suggested in post #9 or either of the suggestions I made in post #12 you could also try:

        $5 = (($7 == "PM") ? a[1] + 12 * (a[1] != 12) : (a[1] == 12) ? "00" : a[1]) ":" a[2] ":" a[3]
1 Like

Hello,

Thanks to everyone for their efforts. Sorry, was away for few days and didn't get the time to look at the solutions provided.

I tried all scripts from this forum but none of the script worked. All the scripts fetching the lines for entire day instead of last 30 min. My requirement is to pull the lines for last 30 min only.

Script used (for e.g.) :

awk -F "<|>| |, |," -v d="$(LANG=C date -d -30minutes -u +"%Y%m%d%T")" '
BEGIN   { split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", m2b)
        for(i = 1; i <= 12; i++)
        b2m[m2b] = sprintf("%02d", i)
}
/^</    { line=$0
        if ( length($3) < 2 ) $3 = "0" $3
        split($5, a, ":" s)
        if ($5 == 24) $5 = "00"
        if ($7 == "PM") $5 = (a[1]+=12) ":" a[2] ":" a[3]
        $0 = $4 b2m[$2] $3 $5
        if ( d < $0 ) print line
}
' file_1.out
$ date   ############ current Date/time on Linux when I ran the script

Sat Mar  9 18:53:47 UTC 2019

Output from the above script:

<Mar 9, 2019, 1:02:39,176 AM UTC> <Error> <Coherence> <BEA-000000>

<Mar 9, 2019, 1:13:22,583 AM UTC> <Error> <Coherence> <BEA-000000>
<Mar 9, 2019, 1:47:08,198 AM UTC> <Error> <Coherence> <BEA-000000>
<Mar 9, 2019, 5:16:42,24 AM UTC> <Error> <Coherence> 

<Mar 9, 2019, 6:50:41,556 PM UTC> <Error> <Coherence> <BEA-000000>
<Mar 9, 2019, 6:56:45,132 PM UTC> <Error> <Coherence> <BEA-000000>

Please suggest as I need to pull last 30 min lines only whenever i execute this script not for the entire day.

How about carefully reading, understanding, and heeding all the posts (and comments therein) offering help to you? The script you used was commented on and improved in a later post. You shouldn't expect turnkey solutions (although those are frequently delivered) but understand the proposals and experiment with them until they satisfy your needs.

Having said that, how about

$ paste -d'\t\b' <(date -f <(sed 's/^<\|>.*$//g; s/,//2' file) +"%F %T") file | awk -F"\t" -vTS="$(date -d'30 min ago' +'%F %T')" '($1 > TS) {sub ("^" $1 FS, ""); print}'
<Mar 9, 2019, 6:50:41,556 PM UTC> <Error> <Coherence> <BEA-000000>
<Mar 9, 2019, 6:56:45,132 PM UTC> <Error> <Coherence> <BEA-000000>

Hi RudiC,

Yes indeed, I have checked all posts before replying back and clearly mentioned that none of the <scripts> worked because I tried all of them. The previous post had just one of the example as don't want to bump over with all outcomes as they all produced the same outcome.

Intention here is to resolve the issue to get experts advise to get issue resolved as pulling data with dates are extremely difficult due to presence of >=2 date formats in log file.

Below script result into invalid date

$ paste -d'\t\b' <(date -f <(sed 's/^<\|>.*$//g; s/,//2' file) +"%F %T") file | awk -F"\t" -vTS="$(date -d'30 min ago' +'%F %T')" '($1 > TS) {sub ("^" $1 FS, ""); print}'

date: invalid date `2019-03-04T11:03:16.576+0000: 1392540.816: [GC [PSYoungGen: 934720K-'
date: invalid date `\tat java.lang.reflect.Method.invoke(Method.java:606)'

Hi rockstar,
I'm very happy that you had other important matters that kept you away from this thread for a few days after you had given us your assignment to work on in your absence. I'm very sorry that we were not able to give you code that worked in your unspecified environment. I apologize for not responding on this issue for the last four days, but I've also been busy doing other things.

We are here to help you learn how to write code to meet your needs on your own. We are not here to act as your unpaid programming staff and should not be expected to write code for you while you are away doing something else. If you're unwilling to answer questions, unwilling to show us the output each of the suggested responses produced on your system, and explain what was going wrong; then there is no reason for us to waste any time trying to help you learn how to do things like this on your own.

Just saying that a script doesn't work doesn't help anyone. I can easily state that some code that you have written doesn't work, but if I don't explain how it didn't work or why it didn't work none of us learns anything useful about the problem at hand.

Like I can tell you that using:

date -d @-5000 '%Y-%m-%d %H:%M:%S'

mimicking something you showed us in post #1 in this thread is wrong. But that doesn't help you learn how to fix it. The above code has absolutely nothing to do with what the time was half an hour ago. The above code asks the system to give you a time 5000 seconds before the UNIX Epoch (i.e. 5000 seconds before midnight on the morning of January 1, 1970 at 12:00:00 AM GMT). Something like:

date -d now-1800seconds '%Y-%m-%d %H:%M:%S'

would come a lot closer to giving you a timestamp that occurred 30 minutes ago (and in a format that could be used to directly compare two timestamps as strings to see if one was earlier or later than the other until we get to the year 10000).

Your repeated refusal to use CODE tags when presenting sample input, sample output, and code segments shows us that you don't want us to see the actual format of the data you are processing and makes it impossible for us to guess at how a real solution to your problem would need to be written. (The moderators have attempted to clean up your posts, but we have obviously guessed incorrectly on some of your formatting or one or more of the suggested solutions provided would likely have met your needs.)

Above you say that having >=2 date formats is a problem??? You originally said there were exactly two date formats and that one of those formats was to be completely ignored. That made things easy. If there are other date formats you haven't told us about, it becomes very clear why none of the suggested solutions had a chance of working in your environment.

The fact that the date format you have given us to work with can't be directly compared to other dates in that format between the hours of 11:30pm on one day and 1:00am on the next day nor between 11:30am and 1:00pm on the same day is a nuisance that requires the date format in your sample data to be converted to a different format for comparisons, but I thought most, if not all, of the suggestions you had been given had tried to do that (and when they didn't, follow-up comments provided ways to get around those problems).

1 Like