Grep patterns and group counts

finn · September 3, 2015, 5:50am

Hi,

I have a continuous log file which has the following format:-

02/Sep/2015: IP 11.151.108.166 error occurred etc
03/Sep/2015: IP 11.151.108.188 error occurred etc
03/Sep/2015: IP 11.152.178.250 error occurred etc
03/Sep/2015: IP 11.188.108.176 error occurred etc
03/Sep/2015: IP 11.14.108.146 error occurred etc
03/Sep/2015: IP 11.188.178.1 error occurred etc
03/Sep/2015: IP 11.151.188.142 error occurred etc
03/Sep/2015: IP 11.151.21.188 error occurred etc

I want to create a daily report of total counts of IPs grouped by the first two sections of the IP eg 11.151 etc. So, for the above it would show:-

IP total counts for 03/Sep/2015:-

11.151 = 3 (entries)
11.152 = 1
11.188 = 2
11.14 = 1
etc

I think i'm okay doing the count by the date but i'm not sure how to grep the pattern of the IP.
I do know which IP ranges are in the log but would rather not hard code each individual pattern (e.g. grep 11.151*, grep 11.152* etc) in case new ones appear.

Thanks

RavinderSingh13 · September 3, 2015, 5:59am

Hello finn,

Following may help you in same, if order doesn't matter for you.

1st:
awk '{split($3, A,".");B[A[1]"."A[2]]++} END{for(i in B){print i OFS B}}' OFS=" = "  Input_file
OR for specific date like you mentioned in you post
2nd:
awk '{if($1 ~ /^03\/Sep\/2015:$/){split($3, A,".");B[A[1]"."A[2]]++}} END{for(i in B){print i OFS B}}' OFS=" = "  Input_file

Output will be as follows for without date in 1st command above.

Output will be as follows with date mentioned in 2nd command above.

Thanks,
R. Singh

Akshay_Hegde · September 3, 2015, 6:10am

[akshay@localhost tmp]$ cat file
02/Sep/2015: IP 11.151.108.166 error occurred etc
03/Sep/2015: IP 11.151.108.188 error occurred etc
03/Sep/2015: IP 11.152.178.250 error occurred etc
03/Sep/2015: IP 11.188.108.176 error occurred etc
03/Sep/2015: IP 11.14.108.146 error occurred etc
03/Sep/2015: IP 11.188.178.1 error occurred etc
03/Sep/2015: IP 11.151.188.142 error occurred etc
03/Sep/2015: IP 11.151.21.188 error occurred etc

[akshay@localhost tmp]$ awk -F'[ .]' '$1==dt":"{A[$3"."$4]++}END{for(i in A)print i" = "A}' dt="03/Sep/2015" file
11.14 = 1
11.188 = 2
11.151 = 3
11.152 = 1

Aia · September 3, 2015, 11:52am

Perhaps a bit of Perl?

#!/usr/bin/perl
# finn.report.pl

# these two statements help detecting bugs
use strict;
use warnings;

my %dates; # to keep a record by dates

# read the file given at the command line
while(<>) {
    # extract date and ip, disregard rest 
    my ($day, undef, $nbits, undef) = split; 
    # extract ip bits wanted
    my ($netbits) = $nbits =~ /^(\d+\.\d+)/;
    # increase count of ip bits
    $dates{$day}{$netbits}++;
}

# report back to stdout
for my $d (sort keys %dates){
    print "IP total counts for $d\n";
    for my $ipp (sort keys %{$dates{$d}}){
        printf "%-8s = %s\n", $ipp, $dates{$d}{$ipp};
    }
    print "\n";
}

Run as

$ perl finn.report.pl finn.file 
IP total counts for 02/Sep/2015:
11.151   = 1

IP total counts for 03/Sep/2015:
11.14    = 1
11.151   = 3
11.152   = 1
11.188   = 2

finn · September 4, 2015, 12:01pm

Thanks all - some great solutions there that work. That AWK is a really powerful tool.