Parsing and timestamp a pattern in log

amazigh42 · March 26, 2013, 9:39am

Hello

Thanks to Chubler_XL and MadeInGermany for their help few weeks ago.
Now, i would like modifying the script, see the next POST.

The old script works like that :
I picked any random hours
In the logs there is the stamp time of webservices, i can see the behavior or errors of webservices. Just put the approximate times in order to have a portion of the log. This could also handle errors when the script does not find the right hour. The script greps a bit of script between timestamp.

cat log_name

aaaaaaaaaaaaaa
bbbbbbbbbbbbb
cccccccccccc
[24/01/2013 09:10]
sssssssssssssss
error-jonas123
nnnnnnnnnnnnn
[24/01/2013 10:10]
uuuuuuuuuuuuuuu
jjjjjjjjjjjjjj
error-jonas123
mmmmmmmmmmmmm
[24/01/2013 10:30]
oooooooooooo
error-jonas123
qqqqqqqqqqq
[24/01/2013 10:45]
vvvvvvvvv
sssssssss
wwwwwwwwww

The result

./my script log_name
[24/01/2013 10:10]
uuuuuuuuuuuuuuu
jjjjjjjjjjjjjj
error-jonas123
mmmmmmmmmmmmm
[24/01/2013 10:30]

Of course, it would be desirable to put the dates in variables.

vi my_script

#!/bin/bash
log_name=$1
if [[ "$log_name" =~ .gz$ ]]
     then z_cat="gunzip -c"
     else z_cat=cat
fi
$z_cat $log_name |awk -F"[/ \\\][]" -v S="24/01/2013 10:10" -v E="24/01/2013 10:30" '
function dcmp(b) {
  if($4>b[3])return  1;
  if($4<b[3])return -1;
  if($3>b[2])return  1;
  if($3<b[2])return -1;
  if($2>b[1])return  1;
  if($2<b[1])return -1;
  if($5>b[4])return  1;
  if($5<b[4])return -1;
  return 0;
}
BEGIN{split(S, ds, "[/ ]"); split(E, de, "[/ ]") }
/^[[][0-9][0-9]\/[0-1][0-9]\/[[0-9][0-9][0-9][0-9] / {
   if(s&&dcmp(de)>=0) {print; exit}
   if(!s&&dcmp(ds)<=0) {f=x;w=1}
   if(!s&&dcmp(ds)>=0) {printf "%s",f; f=x; s=1 }
}
!w&&!s {f=f $0 "\n"}
s'

---------- Post updated at 08:39 AM ---------- Previous update was at 08:32 AM ----------

Now, i would like modifying the script like this :
The script has several identical patterns like for example error-jonas123.
The script will have to pick the first pattern, then it will have to search the nearest date before. Then it will have to search the last pattern and it will have to search the nearest date after.

cat log_name

aaaaaaaaaaaaaa
bbbbbbbbbbbbb
cccccccccccc
[24/01/2013 09:10]
sssssssssssssss
error-jonas123
nnnnnnnnnnnnn
[24/01/2013 10:10]
uuuuuuuuuuuuuuu
jjjjjjjjjjjjjj
error-jonas123
mmmmmmmmmmmmm
[24/01/2013 10:30]
oooooooooooo
error-jonas123
qqqqqqqqqqq
[24/01/2013 10:45]
vvvvvvvvv
sssssssss
wwwwwwwwww

my_script log_name

The expected result

[24/01/2013 09:10]
sssssssssssssss
error-jonas123
nnnnnnnnnnnnn
[24/01/2013 10:10]
uuuuuuuuuuuuuuu
jjjjjjjjjjjjjj
error-jonas123
mmmmmmmmmmmmm
[24/01/2013 10:30]
oooooooooooo
error-jonas123
qqqqqqqqqqq
[24/01/2013 10:45]

Can you give me somes ideas to change the script.

DGPickett · March 26, 2013, 4:28pm

It gets tricky when there are adjacent errors, unless you report the time in the middle twice, one at the end of the first and once at the beginning of the second. You could write a pretty simple sed script to pull all the lines from timestamp N to timestamp N+1 into the buffer, check for error and write to output or side file if any, get rid of all but the last line, and loop back to filling the buffer up to the next time stamp.

It would be easy to make the buffer load into one line before writing it out, so they can be handled more simply after.

amazigh42 · April 2, 2013, 12:02pm

Hello,
I thought I understood the magenta pattern but not.

I have understood this line with the commande echo

$z_cat $log_name |awk -F"[/ \\\][]" -v S="24/01/2013 10:10" -v E="24/01/2013 10:30"

echo "[24/01/2013 10:10 10:51]" | awk -F"[/ \\\][]" '{ print FS ; print $2; print $3; print $4; print $5; }'
[/ \][]
24
01
2013
10:10

#!/bin/bash
log_name=$1
if [[ "$log_name" =~ .gz$ ]]
     then z_cat="gunzip -c"
     else z_cat=cat
fi
$z_cat $log_name |awk -F"[/ \\\][]" -v S="24/01/2013 10:10" -v E="24/01/2013 10:30" '
function dcmp(b) {
  if($4>b[3])return  1;
  if($4<b[3])return -1;
  if($3>b[2])return  1;
  if($3<b[2])return -1;
  if($2>b[1])return  1;
  if($2<b[1])return -1;
  if($5>b[4])return  1;
  if($5<b[4])return -1;
  return 0;
}
BEGIN{split(S, ds, "[/ ]"); split(E, de, "[/ ]") }
/^[[][0-9][0-9]\/[0-1][0-9]\/[[0-9][0-9][0-9][0-9] / {
   if(s&&dcmp(de)>=0) {print; exit}
   if(!s&&dcmp(ds)<=0) {f=x;w=1}
   if(!s&&dcmp(ds)>=0) {printf "%s",f; f=x; s=1 }
}
!w&&!s {f=f $0 "\n"}
s'

What do these lines match ?

b[1]
b[2]
b[3]
b[4]
b[5]

Thanks in advance.

amazigh42 · April 4, 2013, 3:56am

Hello,

I would like to have confirmation.

$z_cat $log_name |awk -F"[/ \\\][]" -v S="24/01/2013 10:10" -v E="24/01/2013 10:30" '
function dcmp(b) {
  if($4>b[3])return  1;

In $log_name I have lines which look like to :

[24/01/2013 09:11:59,236] ERROR [pool-1-thread-3][org.objectweb.jonas.jca.process]

1- Does the red pattern b[3] match with the red pattern 2013 ?
2- And Does the magenta pattern $4 match with the magenta pattern 2013 ?
3- Else, how to debug b[3] with printf ?

Any help will be greatly appreciated because I can not move anymore.