awk : Search for text between two time frame (12 hours)

zenkarthi · November 18, 2013, 1:22pm

I have created the script to grep the errors from weblogic logs files and redirecting output to file.txt ...From file.txt I'm using awk command to collect the past 20 mins output...The script running from cron every 15 mins... The script working well...

Now the challenges, I'm trying to use this same script for different application but that log files format is 12 hours so need your help to use awk or sed command to collect the past 15 mins from 12 hours date format.

The log files 12 hours format (ex- Nov 18, 2013 9:50:16 AM UTC)

Corona688 · November 18, 2013, 1:42pm

What's your system?

zenkarthi · November 18, 2013, 1:51pm

Redhat Linux.

---------- Post updated at 01:50 PM ---------- Previous update was at 01:48 PM ----------

awk - since its 12 hours format so I'm facing issue "AM, PM" and in case if time stamp (from=9:40" and to=10:00" then I'm not receiving any output

I tried the following but no luck...

awk '$0 >= "Nov 14, 2013 9:40:01" && $0 <= "Nov 14, 2013 9:55:01"' file

sed -n '/Nov 14, 2013 7:58:00 PM UTC/,/Nov 14, 2013 8:10:00 PM UTC/p' file

---------- Post updated at 01:51 PM ---------- Previous update was at 01:50 PM ----------

Corona688 - Thanks for the reply, please let me know if you need any other details

Corona688 · November 18, 2013, 2:08pm

Don't bump posts if I don't answer. I was writing. I'm certainly not shy of asking for more info if needed

Your request is possible, if awkward. <= >= compare ASCII strings alphabetically... YYYY-MM-DD HH:MM:SS dates actually do sort alphabetically, though. You need to convert Nov into month number, AM/PM into 24-hour time to accomplish this.

awk -F'[ ,:]+' 'BEGIN {
        # Build tables so MON["Jan"] becomes 1, etc.
        split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", M);
        for(X in M) MON[M[X]]=X
        for(X=1; X<=11; X++) { T[X"AM"]=X ; T[X"PM"]=X+12 }
        T["12PM"]=12;        T["12AM"]=24; }

{ YYMMDD=sprintf("%04d %02d %02d %02d %02d %02d", $3, MON[$1], $2, T[$4 $7], $5, $6)); }

# YYMMDD is a timestamp directly comparable with <= >= against other YYMMDD dates.

...'

zenkarthi · November 18, 2013, 2:39pm

Thanks Corona688.

so would you suggesting to covert the output (ex-file.txt) from 12 hours to 24 hours ? It seems file.txt size is 5 mb, is it any other possible way to collect the logs between two time frame without converting the format?
worst case, If I go for converting the time format, do I need to create two different script?
a, grep the errors from log files and redirect the output (file.txt)
b, converting 12 hours to 24 hours format ?
c, awk '$0 >= "from" && $0 <= "to"

Sorry I'm little confused, could you please explain more on this ?

Corona688 · November 18, 2013, 2:43pm

No, I was suggesting adapting your existing code using the code I gave you so it can handle the kind of dates in your log file. Comparing the YYYYMMDD variable and your input time instead of comparing raw lines. No need to save a new file.

zenkarthi · November 18, 2013, 8:11pm

sorry I'm not able to catch can you provide some example please ?

My script

#!/bin/bash
#set -xv
# The script to verify the error from application logs
# Please don't edit this file

#removing the old scripting logs
rm /home/mydir/script/file.txt
rm /home/mydir/script/output

#date based on the log format
date=`date "+%b %-d, %Y"`

#Log Details
log1=application log file1*
log2=application log file2*
log3=application log file3*

#checking logs
more $log1 | grep -i "$date" | egrep -i 'error1 | error2 | error3 | error4 | error5' >> /home/mydir/script/file.txt
more $log2 | grep -i "$date" | egrep -i 'error1 | error2 | error3 | error4 | error5' >> /home/mydir/script/file.txt
more $log3 | grep -i "$date" | egrep -i 'error1 | error2 | error3 | error4 | error5' >> /home/mydir/script/file.txt

# Time stamp (based on the log format)

tot=`date "+%r %Z" | sed 's/^0//'`
to=`date "+%b %d, %Y"`

frmt=`date -d "-20 minutes" "+%r %Z" | sed 's/^0//'`
from=`date "+%b %-d, %Y"`

#collecting log output between two time stamp

awk '$0>=from && $0<=to' from="$from $frmt" to="$to $tot" /home/mydir/script/file.txt > /home/mydir/script/output

#email

email script

# END

log file format :

###<Nov 6, 2013 8:30:23 PM UTC> <Notice> <stoutdf> <host name> < <[ACTIVE] ExecuteThread:

Corona688 · November 19, 2013, 10:09am

The log data you posted earlier has no # < > 's, which is right? Post a good few unmodified lines please.

Corona688 · November 19, 2013, 10:23am

This program can hopefully deal with both:

BEGIN {
        FS="[#<> ,:]+"
        # Build tables so MON["Jan"] becomes 1, etc.
        split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", M);
        for(X in M) MON[M[X]]=X
        for(X=1; X<=11; X++) { T[X"AM"]=X ; T[X"PM"]=X+12 }
        T["12PM"]=12;        T["12AM"]=24;
}

{
        N=1;

        while((MON[$N]<=0)&&(N<NF)) N++;

        YYMMDD=sprintf("%04d %02d %02d %02d %02d %02d",
                $(N+2), MON[$N], $(N+1), T[$(N+3) $(N+6)], $(N+4), $(N+5));
}

(YYMMDD >= from) && (YYMMDD <= to)

awk -f dconv.awk from="2013 11 06 19 30 00" to="2013 11 06 21 30 00" datafile

zenkarthi · November 20, 2013, 1:45pm

Awesome , It's working perfect.

Little challenges now,

$more file

####<Nov 19, 2013 8:19:04 PM UTC> TimedOutException <Test-hostname>
outofmemory
test
foo
####<Nov 19, 2013 8:29:04 PM UTC> RRTUCKR <Test-hostname>
ABC
OutofMemory
####<Nov 19, 2013 8:35:04 PM UTC> RRSTUCKR <Test-hostname>



$awk -f /home/mydir/script/dconv.awk 'from=2013 11 19 18 01 27' 'to=2013 11 19 18 30 27' /home/mydir/script/file
####<Nov 19, 2013 8:19:04 PM UTC> TimedOutException <Test-hostname>
####<Nov 19, 2013 8:29:04 PM UTC> RRTUCKR <Test-hostname>



can you please suggest me to collect text within and between time stamps.

####<Nov 19, 2013 8:19:04 PM UTC> TimedOutException <Test-hostname>
outofmemory
test
foo
####<Nov 19, 2013 8:29:04 PM UTC> RRTUCKR <Test-hostname>

---------- Post updated 11-20-13 at 01:45 PM ---------- Previous update was 11-19-13 at 04:54 PM ----------

can you help me to collect the text between the time stamps using awk, sorry this last moment request from application team.

Corona688 · November 20, 2013, 4:02pm

I post when I have time and access to things I need. Bumping does not help. You are very close to being set read-only for a while.

This completely changes the face of it. Thinking on it.

Corona688 · November 20, 2013, 4:05pm

BEGIN {
        FS="[#<> ,:]+"
        # Build tables so MON["Jan"] becomes 1, etc.
        split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", M);
        for(X in M) MON[M[X]]=X
        for(X=1; X<=11; X++) { T[X"AM"]=X ; T[X"PM"]=X+12 }
        T["12PM"]=12;        T["12AM"]=24;
}

{
        YYMMDD=sprintf("%04d %02d %02d %02d %02d %02d",
                $4, MON[$2], $3, T[$5 $8], $6, $7);
        A[++LINE]=$0
}

/####/ {
        S=E
        E=YYMMDD

        if((S >= from) && (E >= from) && (S <= to) && (E <= to))
        {
                for(X=1; X<=LINE; X++) print A[X];
                LINE=0
        }
}

$ awk -f dconv2.awk data

####<Nov 19, 2013 8:19:04 PM UTC> TimedOutException <Test-hostname>
outofmemory
test
foo
####<Nov 19, 2013 8:29:04 PM UTC> RRTUCKR <Test-hostname>

$

zenkarthi · November 20, 2013, 11:25pm

Thanks , I'm receiving output only If I specify the time format but it's looks collecting the previous logs entry as well (ex - from not working, to - working perfectly )

awk -f dconv2.awk 'from=2013 11 19 10 50 00' 'to=2013 11 19 14 55 49' data

Corona688 · November 21, 2013, 10:55am

It gives exactly what you asked for. Post some data it doesn't work for, and exactly what you expect from it.

zenkarthi · November 21, 2013, 1:51pm

Appreciate your help, My expectation to collect all data's within two time frame from the logs files

$ awk -f dconv2.awk data
$

Details in the logs files

$ more data
####<Nov 19, 2013 7:10:04 PM UTC> TimedOutException <Test-hostname>
outofmemory
test
foo
####<Nov 19, 2013 7:29:04 PM UTC> RRTUCKR <Test-hostname>
ABC
OutofMemory
####<Nov 19, 2013 7:35:04 PM UTC> RRSTUCKR <Test-hostname>
####<Nov 19, 2013 8:19:04 PM UTC> TimedOutException <Test-hostname>
        at outofmemory
        at test
        at foo
####<Nov 19, 2013 8:29:04 PM UTC> RRTUCKR <Test-hostname>
ABC
OutofMemory
####<Nov 19, 2013 8:35:04 PM UTC> RRSTUCKR <Test-hostname>
####<Nov 19, 2013 9:10:04 PM UTC> TimedOutException <Test-hostname>
outofmemory
test
foo
####<Nov 19, 2013 9:29:04 PM UTC> RRTUCKR <Test-hostname>
ABC
OutofMemory
####<Nov 19, 2013 9:35:04 PM UTC> RRSTUCKR <Test-hostname>
$

The following command collecting the previous log details as well ... from time is mentioned as 20:10 but its collecting from beginning to till 'to'

$ awk -f dconv2.awk 'from=2013 11 19 20 10 00' 'to=2013 11 19 21 00 49' data
####<Nov 19, 2013 7:10:04 PM UTC> TimedOutException <Test-hostname>
outofmemory
test
foo
####<Nov 19, 2013 7:29:04 PM UTC> RRTUCKR <Test-hostname>
ABC
OutofMemory
####<Nov 19, 2013 7:35:04 PM UTC> RRSTUCKR <Test-hostname>
####<Nov 19, 2013 8:19:04 PM UTC> TimedOutException <Test-hostname>
        at outofmemory
        at test
        at foo
####<Nov 19, 2013 8:29:04 PM UTC> RRTUCKR <Test-hostname>
ABC
OutofMemory
####<Nov 19, 2013 8:35:04 PM UTC> RRSTUCKR <Test-hostname>
$

Corona688 · November 21, 2013, 2:35pm

It gives this output on my system:

$ awk -f dconv2.awk 'from=2013 11 19 20 10 00' 'to=2013 11 19 21 00 49' data

####<Nov 19, 2013 8:19:04 PM UTC> TimedOutException <Test-hostname>
outofmemory
test
foo
####<Nov 19, 2013 8:29:04 PM UTC> RRTUCKR <Test-hostname>
ABC
OutofMemory
####<Nov 19, 2013 8:35:04 PM UTC> RRSTUCKR <Test-hostname>

$

Try nawk if its misbehaving on yours.

zenkarthi · November 21, 2013, 4:09pm

I have tried different linux servers but all are showing the same output... Installing nawk again its little pain to get the approval on Prod.

I'm using the same dconv2.awk which you provided me...I hope no difference...

more dconv2.awk
BEGIN {
        FS="[#<> ,:]+"
        # Build tables so MON["Jan"] becomes 1, etc.
        split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", M);
        for(X in M) MON[M[X]]=X
        for(X=1; X<=11; X++) { T[X"AM"]=X ; T[X"PM"]=X+12 }
        T["12PM"]=12;        T["12AM"]=24;
}

{
        YYMMDD=sprintf("%04d %02d %02d %02d %02d %02d",
                $4, MON[$2], $3, T[$5 $8], $6, $7);
        A[++LINE]=$0
}

/####/ {
        S=E
        E=YYMMDD

        if((S >= from) && (E >= from) && (S <= to) && (E <= to))
        {
                for(X=1; X<=LINE; X++) print A[X];
                LINE=0
        }
}

Corona688 · November 21, 2013, 4:14pm

It really does appear exactly the same.

You don't need nawk in Linux, the default on most systems is GNU awk anyway which is more than good enough. Some systems have mawk instead which is pretty good these days too, but you can force it to use GNU awk with gawk.

Perhaps your data isn't exactly as you've posted. Maybe alter the FS line into FS="[#<> ,:\t]+" in case your text includes literal tabs. If that doesn't work could you attach a file rather than pasting it please?

zenkarthi · November 21, 2013, 8:45pm

PFA , Thanks for your help.

Corona688 · November 22, 2013, 10:31am

From the way you pasted other text into the file it certainly looks like you assembled this text file via pasting too. It looks full of things which might have been raw tabs once but certainly aren't now. Did you try my suggestion?