if (( $# < 3 || $# > 4 ))
then
printf "Usage: $0 from_date from_time [to_date] to_time\n" >&2
exit 2
fi
This just prints the usage string and terminates with exit code 2 if the number of passed arguments is not 3 or 4.
FDAY=$1
FTIME=$2
if (( $# == 3 ))
then
TDAY=$FDAY
TTIME=$3
else
TDAY=$4
TTIME=$3
fi
This sets FDAY,FTIME and TDAY,TTIME from the passed arguments, when 3 arguments are passed TDAY defaults to FDAY
FROM=$(date -d "$FDAY $FTIME" +%s)
(($? != 0 )) && exit 3
TO=$(date -d "$TDAY $TTIME" +%s)
(($? != 0 )) && exit 4
Calculate FROM and TO as seconds from epoch (midnight 1/1/1970). Here we allow any error messages from date to be displayed and exit with 3 in the case of an invalid from date/time or 4 for to date/time.
if (( $# == 3 && TO < FROM ))
then
#FROM time later that TO time so add a day
(( TO+=3600*24))
fi
Here if 3 arguments and TO time is earlier than FROM time (eg 9pm to 1am) make the TO date the next day. 3600 is seconds in 1 hour, multiply by 24 gives 1 day worth of seconds. Remember these dates are seconds passed epoch date.
if (( TO < FROM ))
then
echo "$0: FROM date must be before TO date" >&2
exit 5
fi
Trap error where TO date is before FROM and exit with 5.
echo "Examining from $(date -d @$FROM) ($FROM)"
echo " to $(date -d @$TO) ($TO)"
echo
Display confirmation that the calculated dates match what was requested. This is quite usefull as date can accept strings like "today" or "yesterday" and it's good to be specific about the range going to be checked.
gawk -v F=$FROM -v T=$TO -v debug=0 '
Using GNU awk, this is needed as time/date functions are not supported in standard awk.
Pass shell $FROM in as variable F and $TO as variable. Variable debug set to 0 for false (non zero is true).
debug{for(i=1;i<=NF;i++) printf "$%d=%s\n", i, $i }
This debug outpus each field awk has split from the input file.
FNR==1 {
for(ip in C) printf "%7d %s\n", C[ip], ip
delete C
print "Processing " FILENAME " file"
}
If processing the first record for a file output the contents of the C[] array from the previous file. The %7d
format ensures 7 digit right justified printing.
Note: This is also done in the END block to get counts for last file processed.
$5 ~ "-0[45]00]" {
Only process rows where field number 5 is "-0400" or "-0500". This skips records from other timezones or non-valid log lines (eg headers or other record types).
split($4,v,"[[/: ]")
Split field 4 into variable V using left-square-bracket,colon,space or slash as word separators so:
[02/Jul/2015:01:55:59
gives
v[1]=
v[2]=02
v[3]=Jul
v[4]=2015
v[5]=01
v[6]=55
v[7]=59
mnum=index("xxJanFebMarAprMayJunJulAugSepOctNovDec", v[3])/3
Calculate month number from 3 char short month name. Index returns ordinal position in string of v[3]
so Jul gives 21. Once we divide by 3 we get the correct month number (eg Sep=9 Dec=12)
tm=mktime(v[4] " " mnum " " v[2] " " v[5] " " v[6] " " v[7])
mktime() requires string with "YYYY MM DD HH MM SS" format. Note if mnum has a invalid value like 0 or 1.33333, or the date is invalid in some other way (eg 30 Feb 2015) mktime returns -1, which will not be between the F and T values so nothing will be counted.
if (tm >= F && tm <= T) C[$1]++
else if(debug) print tm " not between " F " and " T
if tm is between FROM and TO increment the C[] array. This is the crux of the counting of ip addresses.
The C[] array array will use IP address (field $1) as the index and count as the value so it ends up like this:
C[192.168.0.20]=208
C[203.22.200.1]=15
C[215.215.215.215]=1051
To get the top 5 by count, you could use the GNU awk ordered arrays feature and only print the first 5 records. But in this case as it's only done after each file is processed it is much easier and still fairly efficient to use the external unix sort and head functions like this:
for(ip in C) printf "%7d %s\n", C[ip], ip | "sort -k1,1rn | head -5"
close("sort -k1,1rn | head -5")
Sort using first field -k1,1
with reverse order r
numeric n
sorting, head -5
for top 5
Note these 2 lines need to be in both the END
and FNR==1
blocks, as a replacement for the existing for(...
line