Grepping log file

Dear All,

I have a log file that is dislpayed as:

<msg time='2009-10-14T05:46:42.580+00:00' org_id='oracle' comp_id='tnslsnr'
 type='UNKNOWN' level='16' host_id='mtdb_a'
 host_addr='UNKNOWN' version='1'>
 <txt>14-OCT-2009 05:46:42 * (CONNECT_DATA=(SID=fgs)(CID=(PROGRAM=sqlplus@mtdb)(HOST=mtdb_a)(USER=root))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.60.4.2)(PORT=34898)) * establish * fgs * 0
 </txt>
</msg>

How can i grab in the <txt> the

1- Date
2- Time
3- Host
4 - All together seperated by a space

regards all

Not sure if these are the Time/Date/Host values you mentioned:

sed -ne "s/.*time=.\([^']*\)'.*/\1/p" -e "s/.*HOST=\([^)]*\)).*/\1/p" infile| awk '$1=$1' RS=
2009-10-14T05:46:42.580+00:00 10.60.4.2

Hi,

Thanks for the feedback!

Hmmm, its the time below, and how can i apply this command to a file in unix shell ?

Regards

Ok, here with the other date/time:

sed -ne "s/.*txt.\([^ ]* *[^ ]* \)\*.*HOST=\([^)]*\)).*/\1\2/p" infile
14-OCT-2009 05:46:42 10.60.4.2

You can apply this on a file for sure - the one I used up there I called "infile".

Wow, this is super...Thanks!

What if i would like to capture just the ip, this syntax is complex!?

---------- Post updated at 01:07 AM ---------- Previous update was at 01:06 AM ----------

awk -F '[ >=)]' '/<txt>/{print $3,$4,$24}'
14-OCT-2009 05:46:42 10.60.4.2

Just IP

awk -F '[=)]' '/<txt>/{print $17}'
10.60.4.2

Could you please give me some advice on how i could get do a reporting procedure on the time and ip.....for example....count the ip number per hour...or change granularity to day or whatever, what do you think ?

man cron and crontab

Np; if you only want the IP:

sed -ne "s/.*HOST=\([^)]*\)).*/\1/p" infile
10.60.4.2

How can i apply crono or cronotab to my data? I think periods of 1 hour are the best to analze, dont you agree?

I should count the number of ips per hour......so the count i have already made :

#!/usr/bin/perl -w
use strict;
my %counts = ();


my $file = "access.log";
open (FH, "< $file") or die "Can't open $file for read: $!";
my @lines;
while (<FH>) {
    push (@lines, $_);
}
for (@lines) {
 $counts{$_}++;
 }
 foreach my $keys (keys %counts) {
 print "$keys = $counts{$keys}\n";
}



close FH or die "Cannot close $file: $!";



---------- Post updated at 04:50 AM ---------- Previous update was at 04:23 AM ----------

Cron and cronotab schedule a job to be done according to the local time. But my case is different, so, based on the data i have - log file - how can i collect the date/time and split them into chunks of 1 hour....of course each time is associated to a ip address. Any ideias?

So you want to count how often each IP shows up per hour?

Yes, that is exactly what i want. In the specific hour a chunk of information, the time, date and its mapping to the ip address. Should the files be split into seperate files having for example the title (month, day, hour), i am not sure of the best solution.

---------- Post updated at 06:58 AM ---------- Previous update was at 06:17 AM ----------

Maybe a Array of Hashes could be the best solution ?

Here a point to start at with awk without splitting it up on hours, just taking the whole input file:

$> cat infile
16-OCT-2009 09:11:47 10.65.4.24
16-OCT-2009 09:11:47 10.3.4.11
16-OCT-2009 10:11:47 10.3.4.11
16-OCT-2009 10:11:47 10.65.4.24
16-OCT-2009 10:11:47 10.3.4.11
16-OCT-2009 11:11:47 10.65.4.24
16-OCT-2009 11:11:47 10.65.4.24
16-OCT-2009 11:11:47 10.3.4.11
$> awk 'NR==FNR{a[$3]+=1; next} END{for(x in a){print x,"->",a[x]}}' infile
10.65.4.24 -> 4
10.3.4.11 -> 4

Thanks for the feedback, this is a brilliant way of treating the data, how could i seperate this log into smaller chunks, lets say seperate files based on hours or days, and f.e. release an alert if the number of counts achieves a certain limit, throwing a message f.e. >= 100 -> "red alert"

---------- Post updated at 05:18 AM ---------- Previous update was at 04:58 AM ----------

I was thinking as having the final output of the file as:

The output of all the times-> corresponding ip of that chunk, for example (between 1 and 2 am) and in the final this counting report.

You could just do a grep on a special date and write it to a file which you then check with the little awk line. Not sure if this should be a returning automated mechanism as report for some people that demand it or if it is just for you occasionally.
It can also be included into the awk so grep would not be needed. When I have time again I check it out if nobody comes first :smiley: :wink:

Thanks for the feedback,

An automated solution would be the ideal, the files are too big and will be stored in archive. Od course in case of searching a specific range date date, the awk would suit just fine.

---------- Post updated at 05:29 AM ---------- Previous update was at 05:28 AM ----------

awk '$0>=from&&$0<=to' from="2007/03/20 15:13" to="2007/08/19 14:31" log.dates

---------- Post updated at 05:33 AM ---------- Previous update was at 05:29 AM ----------

As for occasional reports, i think that excel pivot tables give an excelent result for visualization

Defining a pattern range - this has nothing to do with arithmetics or something, just a pattern/string/text matching a range (to keep in mind!):

$> cat infile
16-SEP-2009 09:11:47 10.65.4.24
16-SEP-2009 09:11:47 10.3.4.11
30-SEP-2009 10:11:47 10.3.4.11
1-OCT-2009 10:11:47 10.65.4.24
6-OCT-2009 10:11:47 10.3.4.11
6-OCT-2009 12:31:01 10.3.4.11
16-OCT-2009 11:11:47 10.65.4.24
17-OCT-2009 11:11:47 10.65.4.24
18-OCT-2009 11:11:47 10.3.4.11
$> awk 'NR==FNR && $1 ~ /30-SEP-2009/,/16-OCT-2009/ {a[$3]+=1; next} END{for(x in a){print x,"->",a[x]}}' infile
10.65.4.24 -> 2
10.3.4.11 -> 3

The range is telling awk to take all from the 1st pattern up to the 1st match of the 2nd pattern. So if there are more entries for 16-OCT-2009 you'd better thake 17-OCT, if there is any entries for it. Or maybe just take a complete month to be sure without a range. Instead of a range there could stand a pattern matching for a day, month etc.
Play arround with awk pattern matching :slight_smile:

Thanks! I changed the pattern to a time range like and got a strange output :

awk 'NR==FNR && $2 ~ /15:44:13/,/15:44:14/ {a[$3]+=1; next} END{for(x in a){print x,"->",a[x]}}' access1.log


 -> 2.4.24
 -> 1.4.17
 -> 1.4.8

The counting is done, but part of the ip vanished! :slight_smile:

---------- Post updated at 09:30 AM ---------- Previous update was at 08:51 AM ----------

ok, sorry, i have it working, my file had more than one space between each scalar... :slight_smile:

---------- Post updated at 10:13 AM ---------- Previous update was at 09:30 AM ----------

Hi again,

Is it possible to put this in a perl script or a shell unix script? This is powerfull, but it only works changing the ranges each time. Automation of the process would indeed optimize work! Having a file that could be run from the console, and according to its granularity, split the file and data into severel intervals and sub-files of 1 hour, or split the file and data into subfiles of days, or if bigger splitting into weeks. I look foward from hearing your advice...

So far, you can just take this line and put it in a file called dosomething.sh for example, change it's permissions with chown +x dosomething.sh and fire it off with ./dosomething.sh.

Sure, this can be done in perl too. But that's a job for someone else :wink:

Dear All,

I have this code that gives me a list and a count of duplicates in the range specified.
Is there a way of creating a generic pattern for hours/days? Can this range be fixed for a huge file with lots of days?

 awk ' NR==FNR && $2 ~ /06:00:00/,/15:44:14/{print "---"$1"---" $2"---" $3  } {a[$3]+=1; next }END{for(x in a){print" # Ocurrences --->",  x,"--->",a[x]  }}' right.space

---------- Post updated at 07:14 AM ---------- Previous update was at 06:43 AM ----------

So when it "recieves" one bigger file of a week time frame, it would generate several files, named based on the day...f.e. 10-OCT-2009.file, 11-OCT-2009.file and so on....