Could you please give me some advice on how i could get do a reporting procedure on the time and ip.....for example....count the ip number per hour...or change granularity to day or whatever, what do you think ?
How can i apply crono or cronotab to my data? I think periods of 1 hour are the best to analze, dont you agree?
I should count the number of ips per hour......so the count i have already made :
#!/usr/bin/perl -w
use strict;
my %counts = ();
my $file = "access.log";
open (FH, "< $file") or die "Can't open $file for read: $!";
my @lines;
while (<FH>) {
push (@lines, $_);
}
for (@lines) {
$counts{$_}++;
}
foreach my $keys (keys %counts) {
print "$keys = $counts{$keys}\n";
}
close FH or die "Cannot close $file: $!";
---------- Post updated at 04:50 AM ---------- Previous update was at 04:23 AM ----------
Cron and cronotab schedule a job to be done according to the local time. But my case is different, so, based on the data i have - log file - how can i collect the date/time and split them into chunks of 1 hour....of course each time is associated to a ip address. Any ideias?
Yes, that is exactly what i want. In the specific hour a chunk of information, the time, date and its mapping to the ip address. Should the files be split into seperate files having for example the title (month, day, hour), i am not sure of the best solution.
---------- Post updated at 06:58 AM ---------- Previous update was at 06:17 AM ----------
Maybe a Array of Hashes could be the best solution ?
Thanks for the feedback, this is a brilliant way of treating the data, how could i seperate this log into smaller chunks, lets say seperate files based on hours or days, and f.e. release an alert if the number of counts achieves a certain limit, throwing a message f.e. >= 100 -> "red alert"
---------- Post updated at 05:18 AM ---------- Previous update was at 04:58 AM ----------
I was thinking as having the final output of the file as:
The output of all the times-> corresponding ip of that chunk, for example (between 1 and 2 am) and in the final this counting report.
You could just do a grep on a special date and write it to a file which you then check with the little awk line. Not sure if this should be a returning automated mechanism as report for some people that demand it or if it is just for you occasionally.
It can also be included into the awk so grep would not be needed. When I have time again I check it out if nobody comes first
An automated solution would be the ideal, the files are too big and will be stored in archive. Od course in case of searching a specific range date date, the awk would suit just fine.
---------- Post updated at 05:29 AM ---------- Previous update was at 05:28 AM ----------
The range is telling awk to take all from the 1st pattern up to the 1st match of the 2nd pattern. So if there are more entries for 16-OCT-2009 you'd better thake 17-OCT, if there is any entries for it. Or maybe just take a complete month to be sure without a range. Instead of a range there could stand a pattern matching for a day, month etc.
Play arround with awk pattern matching
The counting is done, but part of the ip vanished!
---------- Post updated at 09:30 AM ---------- Previous update was at 08:51 AM ----------
ok, sorry, i have it working, my file had more than one space between each scalar...
---------- Post updated at 10:13 AM ---------- Previous update was at 09:30 AM ----------
Hi again,
Is it possible to put this in a perl script or a shell unix script? This is powerfull, but it only works changing the ranges each time. Automation of the process would indeed optimize work! Having a file that could be run from the console, and according to its granularity, split the file and data into severel intervals and sub-files of 1 hour, or split the file and data into subfiles of days, or if bigger splitting into weeks. I look foward from hearing your advice...
So far, you can just take this line and put it in a file called dosomething.sh for example, change it's permissions with chown +x dosomething.sh and fire it off with ./dosomething.sh.
Sure, this can be done in perl too. But that's a job for someone else
I have this code that gives me a list and a count of duplicates in the range specified.
Is there a way of creating a generic pattern for hours/days? Can this range be fixed for a huge file with lots of days?
---------- Post updated at 07:14 AM ---------- Previous update was at 06:43 AM ----------
So when it "recieves" one bigger file of a week time frame, it would generate several files, named based on the day...f.e. 10-OCT-2009.file, 11-OCT-2009.file and so on....