Thanks to Don he pointed me in the right direction of debug=1
It seems that when I run the script I am getting this output
1432820036 not between 1433169000 and 1433224800
1432820037 not between 1433169000 and 1433224800
1432820037 not between 1433169000 and 1433224800
1432820037 not between 1433169000 and 1433224800
1432820037 not between 1433169000 and 1433224800
1432820037 not between 1433169000 and 1433224800
1432820037 not between 1433169000 and 1433224800
1432820060 not between 1433169000 and 1433224800
1432820061 not between 1433169000 and 1433224800
1432820061 not between 1433169000 and 1433224800
1432820061 not between 1433169000 and 1433224800
1432820061 not between 1433169000 and 1433224800
I just used your sample input of
./gawk.sh "1 June 2015" 10:30 2:00
I also ran it with out the e on June since my log uses Jun
./gawk.sh "1 Jun 2015" 10:30 2:00
but I got the same results.
I ran
./gawk.sh "02 Jul 2015" 01:50 02:30
with debug=0 I get this
./gawk.sh "02 Jul 2015" 01:30 3:20
Examining from Thu Jul 2 01:30:00 EDT 2015 (1435815000)
to Thu Jul 2 03:20:00 EDT 2015 (1435821600)
Processing /data/log/access_bhp.log file
Processing /data/log/access_hpc.log file
Processing /data/log/access_tfl.log file
Processing /data/log/access_thp.log file
with debug=1 I get
1432822160 not between 1435816200 and 1435818600
1432822160 not between 1435816200 and 1435818600
1432822160 not between 1435816200 and 1435818600
1432822160 not between 1435816200 and 1435818600
1432822160 not between 1435816200 and 1435818600
1432822160 not between 1435816200 and 1435818600
1432822160 not between 1435816200 and 1435818600
1432822161 not between 1435816200 and 1435818600
1432822161 not between 1435816200 and 1435818600
1432822165 not between 1435816200 and 1435818600
1432822175 not between 1435816200 and 1435818600
1432822178 not between 1435816200 and 1435818600
By the way field 5 is -0400]
I'll reply again with the latest suggested code change output.
---------- Post updated at 05:43 PM ---------- Previous update was at 05:28 PM ----------
You're correct it does produce a lot of output. I will include the actual ip on this one entry because it is bingbot
$1=157.55.39.187
$2=-
$3=-
$4=[28/Jun/2015:16:27:29
$5=-0400]
$6="GET
$7=/content/10-customer-testimonials
$8=HTTP/1.1"
$9=200
$10=12025
$11="-"
$12="Mozilla/5.0
$13=(compatible;
$14=bingbot/2.0;
$15=+http://www.bing.com/bingbot.htm)"
1432844849 not between 1435816200 and 1435818600
If you want me to include something specific let me know.
Here is the entire script just to make sure I don't have something wrong
#!/bin/bash
if (( $# < 3 || $# > 4 ))
then
printf "Usage: $0 from_date from_time [to_date] to_time\n" >&2
exit 2
fi
FDAY=$1
FTIME=$2
if (( $# == 3 ))
then
TDAY=$FDAY
TTIME=$3
else
TDAY=$4
TTIME=$3
fi
FROM=$(date -d "$FDAY $FTIME" +%s)
(($? != 0 )) && exit 3
TO=$(date -d "$TDAY $TTIME" +%s)
(($? != 0 )) && exit 4
if (( $# == 3 && TO < FROM ))
then
#FROM time later that TO time so add a day
(( TO+=3600*24))
fi
if (( TO < FROM ))
then
echo "$0: FROM date must be before TO date" >&2
exit 5
fi
echo "Examining from $(date -d @$FROM) ($FROM)"
echo " to $(date -d @$TO) ($TO)"
echo
FILES=/data/log/access_*.log
gawk -v F=$FROM -v T=$TO -v debug=1 '
{for(i=1;i<=NF;i++) printf "$%d=%s\n", i, $i }
FNR==1 {
for(ip in C) printf "%7d %s\n", C[ip], ip
delete C
print "Processing " FILENAME " file"
}
$5 == "-0400]" {
split($4,v,"[[/: ]")
mnum=int(index("JanFebMarAprMayJunJulAugSepOctNovDec", v[3])/3)
tm=mktime(v[4] " " mnum " " v[2] " " v[5] " " v[6] " " v[7])
if (tm >= F && tm <= T) C[$1]++
else if(debug) print tm " not between " F " and " T
}
END {for(ip in C) printf "%7d %s\n", C[ip], ip} ' $FILES
I noticed this output is including June 28 entries but the code I used was
./gawk.sh "02 Jul 2015" 01:50 02:30
I picked that time frame because there are over 30 entries with the same IP since that is the time I run my sitemap program. It seems it is ignoring what date I am putting in.