Need Time Stamp Range On Log Files

Thanks for getting back to me.

I put the test.log file in /data/log since that is where I have gawk scripts but other than that no other changes and here is the ouput.

./modified_gawk.sh "02 Jul 2015" 01:55:58 01:56
Examining from Thu Jul  2 01:55:58 EDT 2015 (1435816558)
            to Thu Jul  2 01:56:00 EDT 2015 (1435816560)

Processing /data/log/test.log file

This makes no sense.
No matter how debug is set, that script would dump the contents of every line read showing how many fields are present and what each field contains. The output you've shown us indicates that /data/log/test.log contains one or more blank lines, but not the data you showed us in post #19 in this thread, which was:

1.1.1.1 - - [02/Jul/2015:01:55:57 -0400] "GET /content/421-ahmtrust HTTP/1.0" 200 58071 "-" "Sphider"
207.46.13.135 - - [02/Jul/2015:01:55:57 -0400] "GET /Liquid_Herbs_page_1_c_11.html HTTP/1.1" 302 25 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.135 - - [02/Jul/2015:01:55:58 -0400] "GET /index.php?controller=category&id_category=21 HTTP/1.1" 301 25 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
1.1.1.1 - - [02/Jul/2015:01:55:57 -0400] "HEAD /content/422-ahmunbelief HTTP/1.1" 200 - "-" "Sphider"
2.2.2.2 - - [02/Jul/2015:01:55:59 -0400] "GET /themes/warehouse/js/script.js HTTP/1.1" 200 1313 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 5.0; SM-N900T Build/LRX21V) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.93 Mobile Safari/537.36"
2.2.2.2 - - [02/Jul/2015:01:55:59 -0400] "GET /themes/warehouse/cache/50ca4d40aa6b13dfe15d7583bbe75eea.js HTTP/1.1" 200 69947 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 5.0; SM-N900T Build/LRX21V) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.93 Mobile Safari/537.36"
2.2.2.2 - - [02/Jul/2015:01:55:59 -0400] "GET /themes/warehouse/cache/9f19013204b5f3ce3d256dea73bb91e5_all.css HTTP/1.1" 200 42230 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 5.0; SM-N900T Build/LRX21V) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.93 Mobile Safari/537.36"
2.2.2.2 - - [02/Jul/2015:01:55:59 -0400] "GET /content/152-Tea_Tree_Oil_Uses_sp_153 HTTP/1.1" 200 17579 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 5.0; SM-N900T Build/LRX21V) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.93 Mobile Safari/537.36"
207.46.13.135 - - [02/Jul/2015:01:55:59 -0400] "GET /21-Liquid_Herbs_page_1_c_11 HTTP/1.1" 200 16273 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

Please be sure that /data/log/test.log contains the text shown above, verify that the gawk invocation in modified_gawk.sh is:

gawk -v F=$FROM -v T=$TO -v debug=1 '
{for(i=1;i<=NF;i++) printf "$%d=%s\n", i, $i }
FNR==1 {
    for(ip in C) printf "%7d %s\n", C[ip], ip
    delete C
    print "Processing " FILENAME " file"
}

$5 == "-0400]" {
  split($4,v,"[[/: ]")
  mnum=int(index("JanFebMarAprMayJunJulAugSepOctNovDec", v[3])/3)
  tm=mktime(v[4] " " mnum " " v[2] " " v[5] " " v[6] " " v[7]) + 0
  if(debug) print "mtkime(" v[4] " " mnum " " v[2] " " v[5] " " v[6] " " v[7] "): " tm
  if (tm >= F && tm <= T) C[$1]++
  else if(debug) print tm " not between " F " and " T
}
END {for(ip in C) printf "%7d %s\n", C[ip], ip} ' /data/log/test.log

and try running the script again.

If it still doesn't show lots of debugging output, show us the output from the command:

od -bc /data/log/test.log

Hi Don,

It is my fault I didn't catch this phrase

without being overwhelmed with debugging data

meant I needed to turn debug=1

now with it set to debug=1 it shows

./modified_gawk.sh "02 Jul 2015" 01:55:57 01:55:58
Examining from Thu Jul  2 01:55:57 EDT 2015 (1435816557)
            to Thu Jul  2 01:55:58 EDT 2015 (1435816558)

Processing /data/log/test.log file
mtkime(2015 6 02 01 55 00): 1433224500
1433224500 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 00): 1433224500
1433224500 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 03): 1433224503
1433224503 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 03): 1433224503
1433224503 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 04): 1433224504
1433224504 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 06): 1433224506
1433224506 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 07): 1433224507
1433224507 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 09): 1433224509
1433224509 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 09): 1433224509
1433224509 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 12): 1433224512
1433224512 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 13): 1433224513
1433224513 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 15): 1433224515
1433224515 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 16): 1433224516
1433224516 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 17): 1433224517
1433224517 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 17): 1433224517
1433224517 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 18): 1433224518
1433224518 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 19): 1433224519
1433224519 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 21): 1433224521
1433224521 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 22): 1433224522
1433224522 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 24): 1433224524
1433224524 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 25): 1433224525
1433224525 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 27): 1433224527
1433224527 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 28): 1433224528
1433224528 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 30): 1433224530
1433224530 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 31): 1433224531
1433224531 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 33): 1433224533
1433224533 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 33): 1433224533
1433224533 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 36): 1433224536
1433224536 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 37): 1433224537
1433224537 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 39): 1433224539
1433224539 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 40): 1433224540
1433224540 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 42): 1433224542
1433224542 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 43): 1433224543
1433224543 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 45): 1433224545
1433224545 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 46): 1433224546
1433224546 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 48): 1433224548
1433224548 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 49): 1433224549
1433224549 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 51): 1433224551
1433224551 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 52): 1433224552
1433224552 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 54): 1433224554
1433224554 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 55): 1433224555
1433224555 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 57): 1433224557
1433224557 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 57): 1433224557
1433224557 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 58): 1433224558
1433224558 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 57): 1433224557
1433224557 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 59): 1433224559
1433224559 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 59): 1433224559
1433224559 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 59): 1433224559
1433224559 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 59): 1433224559
1433224559 not between 1435816557 and 1435816558
mtkime(2015 6 02 01 55 59): 1433224559
1433224559 not between 1435816557 and 1435816558
mtkime(2015 6 02 02 01 55): 1433224915
1433224915 not between 1435816557 and 1435816558

Thanks,

Hi sharingsunshine,
Please use the 9 lines of data you showed us in your post #19 in this thread; not 53 lines from some other file! And, please put back the line of code shown in red in post #22 that you removed. Since you aren't using data that we can compare to a known input, we have no idea what is going wrong. Either the dates in the file you supplied were in June, or the calculations performed by gawk are off by a month. I'm also guessing that since the timestamps calculated by your script (before calling gawk ) are showing timezone EDT, although the 5th field is currently -0400] , it will be -0500] when daylight savings time is not in effect (and we will need to adjust the time calculations in gawk to account for the offset from GMT).

I am trying to compare known timestamps (in the data in the 9 lines shown in post #19 and repeated in post #22) against the calculations being performed by gawk . When you use different data, and don't show us the date and time data that is being processed, I can't determine what needs to be fixed.

Sorry for my failure to understand what you needed.

./modified_gawk.sh "02 Jul 2015" 01:55:57 01:55:58
Examining from Thu Jul  2 01:55:57 EDT 2015 (1435816557)
            to Thu Jul  2 01:55:58 EDT 2015 (1435816558)

$1=1.1.1.1
$2=-
$3=-
$4=[02/Jul/2015:01:55:57
$5=-0400]
$6="GET
$7=/content/421-ahmtrust
$8=HTTP/1.0"
$9=200
$10=58071
$11="-"
$12="Sphider"
Processing /data/log/test1.log file
mtkime(2015 6 02 01 55 57): 1433224557
1433224557 not between 1435816557 and 1435816558
$1=207.46.13.135
$2=-
$3=-
$4=[02/Jul/2015:01:55:57
$5=-0400]
$6="GET
$7=/Liquid_Herbs_page_1_c_11.html
$8=HTTP/1.1"
$9=302
$10=25
$11="-"
$12="Mozilla/5.0
$13=(compatible;
$14=bingbot/2.0;
$15=+http://www.bing.com/bingbot.htm)"
mtkime(2015 6 02 01 55 57): 1433224557
1433224557 not between 1435816557 and 1435816558
$1=207.46.13.135
$2=-
$3=-
$4=[02/Jul/2015:01:55:58
$5=-0400]
$6="GET
$7=/index.php?controller=category&id_category=21
$8=HTTP/1.1"
$9=301
$10=25
$11="-"
$12="Mozilla/5.0
$13=(compatible;
$14=bingbot/2.0;
$15=+http://www.bing.com/bingbot.htm)"
mtkime(2015 6 02 01 55 58): 1433224558
1433224558 not between 1435816557 and 1435816558
$1=1.1.1.1
$2=-
$3=-
$4=[02/Jul/2015:01:55:57
$5=-0400]
$6="HEAD
$7=/content/422-ahmunbelief
$8=HTTP/1.1"
$9=200
$10=-
$11="-"
$12="Sphider"
mtkime(2015 6 02 01 55 57): 1433224557
1433224557 not between 1435816557 and 1435816558
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/js/script.js
$8=HTTP/1.1"
$9=200
$10=1313
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 6 02 01 55 59): 1433224559
1433224559 not between 1435816557 and 1435816558
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/cache/50ca4d40aa6b13dfe15d7583bbe75eea.js
$8=HTTP/1.1"
$9=200
$10=69947
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 6 02 01 55 59): 1433224559
1433224559 not between 1435816557 and 1435816558
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/cache/9f19013204b5f3ce3d256dea73bb91e5_all.css
$8=HTTP/1.1"
$9=200
$10=42230
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 6 02 01 55 59): 1433224559
1433224559 not between 1435816557 and 1435816558
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/content/152-Tea_Tree_Oil_Uses_sp_153
$8=HTTP/1.1"
$9=200
$10=17579
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 6 02 01 55 59): 1433224559
1433224559 not between 1435816557 and 1435816558
$1=207.46.13.135
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/21-Liquid_Herbs_page_1_c_11
$8=HTTP/1.1"
$9=200
$10=16273
$11="-"
$12="Mozilla/5.0
$13=(compatible;
$14=bingbot/2.0;
$15=+http://www.bing.com/bingbot.htm)"
mtkime(2015 6 02 01 55 59): 1433224559
1433224559 not between 1435816557 and 1435816558

Hope this is correct. Once again, I appreciate your help and sorry I didn't get it correct the first time.

Ok. So we now know that the gawk script is seeing the date and time 02/Jul/2015:01:55:57 but is generating a seconds since the Epoch value that corresponds to the date and time Tue Jun 2 01:55:57 EDT 2015 .

Try changing the following line in your script:

  mnum=int(index("JanFebMarAprMayJunJulAugSepOctNovDec", v[3])/3)

to:

  mnum=index("xxJanFebMarAprMayJunJulAugSepOctNovDec", v[3])/3

and run it again.

With any luck, this should work for you. Then you need to search your log files for a few log entries that that were created when daylight savings time was not in effect. Do you still have any log files that were created before daylight savings time went into effect this year? They should be easy to find with:

fgrep '-0500]' /data/log/*.log

If the fgrep found any entries like that, they won't be included in the counts using your current script. If fgrep didn't find anything, you need to determine if that is because you don't have any log entries that old, or if something else is changing the date format for those entries. If any lines were found, sanitize two or three of them and add them to the file /data/log/test.log and show them to us so we can devise a time range to select one or two of them.

My guess would be that you'll need to change the line:

$5 == "-0400]" {

to one of the two following lines:

$5 == "-0400]" || $5 == "-0500]" {
      or
$5 ~ "-0[45]00]" {

to reliably process all of your input for the US Eastern time zone, but we'll need a couple of sample lines to verify that it does works correctly for both daylight savings time and standard time.

this is what I get with the first change

mnum=index("xxJanFebMarAprMayJunJulAugSepOctNovDec", v[3])/3
$1=1.1.1.1
$2=-
$3=-
$4=[02/Jul/2015:01:55:57
$5=-0400]
$6="GET
$7=/content/421-ahmtrust
$8=HTTP/1.0"
$9=200
$10=58071
$11="-"
$12="Sphider"
Processing /data/log/test1.log file
mtkime(2015 7 02 01 55 57): 1435816557
$1=207.46.13.135
$2=-
$3=-
$4=[02/Jul/2015:01:55:57
$5=-0400]
$6="GET
$7=/Liquid_Herbs_page_1_c_11.html
$8=HTTP/1.1"
$9=302
$10=25
$11="-"
$12="Mozilla/5.0
$13=(compatible;
$14=bingbot/2.0;
$15=+http://www.bing.com/bingbot.htm)"
mtkime(2015 7 02 01 55 57): 1435816557
$1=207.46.13.135
$2=-
$3=-
$4=[02/Jul/2015:01:55:58
$5=-0400]
$6="GET
$7=/index.php?controller=category&id_category=21
$8=HTTP/1.1"
$9=301
$10=25
$11="-"
$12="Mozilla/5.0
$13=(compatible;
$14=bingbot/2.0;
$15=+http://www.bing.com/bingbot.htm)"
mtkime(2015 7 02 01 55 58): 1435816558
$1=1.1.1.1
$2=-
$3=-
$4=[02/Jul/2015:01:55:57
$5=-0400]
$6="HEAD
$7=/content/422-ahmunbelief
$8=HTTP/1.1"
$9=200
$10=-
$11="-"
$12="Sphider"
mtkime(2015 7 02 01 55 57): 1435816557
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/js/script.js
$8=HTTP/1.1"
$9=200
$10=1313
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1435816557 and 1435816558
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/cache/50ca4d40aa6b13dfe15d7583bbe75eea.js
$8=HTTP/1.1"
$9=200
$10=69947
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1435816557 and 1435816558
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/cache/9f19013204b5f3ce3d256dea73bb91e5_all.css
$8=HTTP/1.1"
$9=200
$10=42230
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1435816557 and 1435816558
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/content/152-Tea_Tree_Oil_Uses_sp_153
$8=HTTP/1.1"
$9=200
$10=17579
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1435816557 and 1435816558
$1=207.46.13.135
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/21-Liquid_Herbs_page_1_c_11
$8=HTTP/1.1"
$9=200
$10=16273
$11="-"
$12="Mozilla/5.0
$13=(compatible;
$14=bingbot/2.0;
$15=+http://www.bing.com/bingbot.htm)"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1435816557 and 1435816558
      2 1.1.1.1
      2 207.46.13.135

I don't find anything with the fgrep but looking at the archives I don't have any files that old. Since I don't have any log files that old I put in your time zone changes to test their effects.

Here is the output I get running the fgrep commands

[root@ip-1.1.1.1 log]# fgrep '-0500]' /data/log/*.log
fgrep: invalid option -- ']'
Usage: fgrep [OPTION]... PATTERN [FILE]...
Try `fgrep --help' for more information.
[root@ip-1.1.1.1 log]# fgrep '-0500' /data/log/*.log
[root@ip-1.1.1.1 log]# 

changing to

$5 == "-0400]" || $5 == "-0500]" {

I get

./modified_gawk.sh "02 Jul 2015" 01:55:57 01:55:58
Examining from Thu Jul  2 01:55:57 EDT 2015 (1435816557)
            to Thu Jul  2 01:55:58 EDT 2015 (1435816558)

$1=1.1.1.1
$2=-
$3=-
$4=[02/Jul/2015:01:55:57
$5=-0400]
$6="GET
$7=/content/421-ahmtrust
$8=HTTP/1.0"
$9=200
$10=58071
$11="-"
$12="Sphider"
Processing /data/log/test1.log file
mtkime(2015 7 02 01 55 57): 1435816557
$1=207.46.13.135
$2=-
$3=-
$4=[02/Jul/2015:01:55:57
$5=-0400]
$6="GET
$7=/Liquid_Herbs_page_1_c_11.html
$8=HTTP/1.1"
$9=302
$10=25
$11="-"
$12="Mozilla/5.0
$13=(compatible;
$14=bingbot/2.0;
$15=+http://www.bing.com/bingbot.htm)"
mtkime(2015 7 02 01 55 57): 1435816557
$1=207.46.13.135
$2=-
$3=-
$4=[02/Jul/2015:01:55:58
$5=-0400]
$6="GET
$7=/index.php?controller=category&id_category=21
$8=HTTP/1.1"
$9=301
$10=25
$11="-"
$12="Mozilla/5.0
$13=(compatible;
$14=bingbot/2.0;
$15=+http://www.bing.com/bingbot.htm)"
mtkime(2015 7 02 01 55 58): 1435816558
$1=1.1.1.1
$2=-
$3=-
$4=[02/Jul/2015:01:55:57
$5=-0400]
$6="HEAD
$7=/content/422-ahmunbelief
$8=HTTP/1.1"
$9=200
$10=-
$11="-"
$12="Sphider"
mtkime(2015 7 02 01 55 57): 1435816557
$1=184.98.149.48
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/js/script.js
$8=HTTP/1.1"
$9=200
$10=1313
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1435816557 and 1435816558
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/cache/50ca4d40aa6b13dfe15d7583bbe75eea.js
$8=HTTP/1.1"
$9=200
$10=69947
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1435816557 and 1435816558
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/cache/9f19013204b5f3ce3d256dea73bb91e5_all.css
$8=HTTP/1.1"
$9=200
$10=42230
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1435816557 and 1435816558
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/content/152-Tea_Tree_Oil_Uses_sp_153
$8=HTTP/1.1"
$9=200
$10=17579
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1435816557 and 1435816558
$1=207.46.13.135
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/21-Liquid_Herbs_page_1_c_11
$8=HTTP/1.1"
$9=200
$10=16273
$11="-"
$12="Mozilla/5.0
$13=(compatible;
$14=bingbot/2.0;
$15=+http://www.bing.com/bingbot.htm)"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1435816557 and 1435816558
      2 1.1.1.1
      2 207.46.13.135

Changing to I get

$5 ~ "-0[45]00]" {
 ./modified_gawk.sh "02 Jul 2015" 01:55:57 01:55:58
Examining from Thu Jul  2 01:55:57 EDT 2015 (1435816557)
            to Thu Jul  2 01:55:58 EDT 2015 (1435816558)

$1=1.1.1.1
$2=-
$3=-
$4=[02/Jul/2015:01:55:57
$5=-0400]
$6="GET
$7=/content/421-ahmtrust
$8=HTTP/1.0"
$9=200
$10=58071
$11="-"
$12="Sphider"
Processing /data/log/test1.log file
mtkime(2015 7 02 01 55 57): 1435816557
$1=207.46.13.135
$2=-
$3=-
$4=[02/Jul/2015:01:55:57
$5=-0400]
$6="GET
$7=/Liquid_Herbs_page_1_c_11.html
$8=HTTP/1.1"
$9=302
$10=25
$11="-"
$12="Mozilla/5.0
$13=(compatible;
$14=bingbot/2.0;
$15=+http://www.bing.com/bingbot.htm)"
mtkime(2015 7 02 01 55 57): 1435816557
$1=207.46.13.135
$2=-
$3=-
$4=[02/Jul/2015:01:55:58
$5=-0400]
$6="GET
$7=/index.php?controller=category&id_category=21
$8=HTTP/1.1"
$9=301
$10=25
$11="-"
$12="Mozilla/5.0
$13=(compatible;
$14=bingbot/2.0;
$15=+http://www.bing.com/bingbot.htm)"
mtkime(2015 7 02 01 55 58): 1435816558
$1=1.1.1.1
$2=-
$3=-
$4=[02/Jul/2015:01:55:57
$5=-0400]
$6="HEAD
$7=/content/422-ahmunbelief
$8=HTTP/1.1"
$9=200
$10=-
$11="-"
$12="Sphider"
mtkime(2015 7 02 01 55 57): 1435816557
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/js/script.js
$8=HTTP/1.1"
$9=200
$10=1313
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1435816557 and 1435816558
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/cache/50ca4d40aa6b13dfe15d7583bbe75eea.js
$8=HTTP/1.1"
$9=200
$10=69947
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1435816557 and 1435816558
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/cache/9f19013204b5f3ce3d256dea73bb91e5_all.css
$8=HTTP/1.1"
$9=200
$10=42230
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1435816557 and 1435816558
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/content/152-Tea_Tree_Oil_Uses_sp_153
$8=HTTP/1.1"
$9=200
$10=17579
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1435816557 and 1435816558
$1=207.46.13.135
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/21-Liquid_Herbs_page_1_c_11
$8=HTTP/1.1"
$9=200
$10=16273
$11="-"
$12="Mozilla/5.0
$13=(compatible;
$14=bingbot/2.0;
$15=+http://www.bing.com/bingbot.htm)"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1435816557 and 1435816558
      2 1.1.1.1
      2 207.46.13.135

I apologize for misleading you with the fgrep command. But, we're making great progress! If you want to look for log entries from November 2, 2014 to March 8, 2015:

fgrep ' -0500]' /data/log/*.log

should work if you have logs that cover that period. But, whether it finds anything or not, try just changing the 1st three lines of /data/log/test.log from:

1.1.1.1 - - [02/Jul/2015:01:55:57 -0400] "GET /content/421-ahmtrust HTTP/1.0" 200 58071 "-" "Sphider"
207.46.13.135 - - [02/Jul/2015:01:55:57 -0400] "GET /Liquid_Herbs_page_1_c_11.html HTTP/1.1" 302 25 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.135 - - [02/Jul/2015:01:55:58 -0400] "GET /index.php?controller=category&id_category=21 HTTP/1.1" 301 25 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

to:

3.1.20.15 - - [01/Mar/2015:01:23:46 -0500] "Test in EST"
3.1.20.15 - - [01/Mar/2015:01:23:47 -0500] "Test in EST"

To shorten the debugging log, you can also delete the last three lines from that file.

And then try running the script again with either:

$5 ~ "-0[45]00]" {

or:

$5 == "-0400]" || $5 == "-0500]" {

instead of:

$5 == "-0400]" {

and using the command line:

./modified_gawk.sh "01 Mar 2015" 01:23:47 "02 Jul 2015" 01:55:58

Using this code

$5 ~ "-0[45]00]" {

I get

./modified_gawk.sh "01 Mar 2015" 01:23:47 "02 Jul 2015" 01:55:58
Examining from Sun Mar  1 01:23:47 EST 2015 (1425191027)
            to Thu Jul  2 01:55:58 EDT 2015 (1435816558)

$1=3.1.20.15
$2=-
$3=-
$4=[01/Mar/2015:01:23:46
$5=-0500]
$6="Test
$7=in
$8=EST"
Processing /data/log/test2.log file
mtkime(2015 3 01 01 23 46): 1425191026
1425191026 not between 1425191027 and 1435816558
$1=3.1.20.15
$2=-
$3=-
$4=[01/Mar/2015:01:23:47
$5=-0500]
$6="Test
$7=in
$8=EST"
mtkime(2015 3 01 01 23 47): 1425191027
$1=54.86.148.217
$2=-
$3=-
$4=[02/Jul/2015:01:55:57
$5=-0400]
$6="HEAD
$7=/content/422-ahmunbelief
$8=HTTP/1.1"
$9=200
$10=-
$11="-"
$12="Sphider"
mtkime(2015 7 02 01 55 57): 1435816557
$1=184.98.149.48
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/js/script.js
$8=HTTP/1.1"
$9=200
$10=1313
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1425191027 and 1435816558
$1=184.98.149.48
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/cache/50ca4d40aa6b13dfe15d7583bbe75eea.js
$8=HTTP/1.1"
$9=200
$10=69947
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1425191027 and 1435816558
      1 3.1.20.15
      1 1.1.1.1

Using this code

$5 == "-0400]" || $5 == "-0500]" {

I get

./modified_gawk.sh "01 Mar 2015" 01:23:47 "02 Jul 2015" 01:55:58
Examining from Sun Mar  1 01:23:47 EST 2015 (1425191027)
            to Thu Jul  2 01:55:58 EDT 2015 (1435816558)

$1=3.1.20.15
$2=-
$3=-
$4=[01/Mar/2015:01:23:46
$5=-0500]
$6="Test
$7=in
$8=EST"
Processing /data/log/test2.log file
mtkime(2015 3 01 01 23 46): 1425191026
1425191026 not between 1425191027 and 1435816558
$1=3.1.20.15
$2=-
$3=-
$4=[01/Mar/2015:01:23:47
$5=-0500]
$6="Test
$7=in
$8=EST"
mtkime(2015 3 01 01 23 47): 1425191027
$1=1.1.1.1
$2=-
$3=-
$4=[02/Jul/2015:01:55:57
$5=-0400]
$6="HEAD
$7=/content/422-ahmunbelief
$8=HTTP/1.1"
$9=200
$10=-
$11="-"
$12="Sphider"
mtkime(2015 7 02 01 55 57): 1435816557
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/js/script.js
$8=HTTP/1.1"
$9=200
$10=1313
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1425191027 and 1435816558
$1=2.2.2.2
$2=-
$3=-
$4=[02/Jul/2015:01:55:59
$5=-0400]
$6="GET
$7=/themes/warehouse/cache/50ca4d40aa6b13dfe15d7583bbe75eea.js
$8=HTTP/1.1"
$9=200
$10=69947
$11="https://www.google.com/"
$12="Mozilla/5.0
$13=(Linux;
$14=Android
$15=5.0;
$16=SM-N900T
$17=Build/LRX21V)
$18=AppleWebKit/537.36
$19=(KHTML,
$20=like
$21=Gecko)
$22=Chrome/43.0.2357.93
$23=Mobile
$24=Safari/537.36"
mtkime(2015 7 02 01 55 59): 1435816559
1435816559 not between 1425191027 and 1435816558
      1 3.1.20.15
      1 1.1.1.1

Thanks for sticking with me on this.

You're welcome. And, it looks like we're getting exactly what you want now. So, turn off debugging, take out the line:

{for(i=1;i<=NF;i++) printf "$%d=%s\n", i, $i }

(or, preferably, change it to:

debug{for(i=1;i<=6;i++) printf "%d=%s\n", i, $i }

in case you ever need to turn debugging back on in the future), and change the last line of the awk script from:

END {for(ip in C) printf "%7d %s\n", C[ip], ip} ' /data/log/test.log

back to:

END {for(ip in C) printf "%7d %s\n", C[ip], ip} ' $FILES

and you should get what you want from your real data without the debugging info. (And, it should continue working when we shift back to standard time on November 1st.)

1 Like

This is great, thanks for your help. I also plan to find the top 5 entries in each log file during the range of time. So I'll use what you have given me and then figure out how to do that too.

Once again, thanks for all your help.

Hope it can help someone else too.

Sorry, I had some real life issues that have kept me away from this thread. I'm so glad and greatful that Don Cragun was able to assist with resolving this issue for you.

I'm a little red-faced about that out-by-one issue with the month decoding, but happy to see you have a working solution now. It's worth setting a calendar reminder for yourself to check when daylight savings kicks in, that we don't end up with a 1 hour error.

Hi Chubler,

I certainly understand when life situations change our schedules and priorities. Don was able to keep me going in grand fashion. I am just grateful you took the time to write such a complete set of code in the first place.

I will keep that in mind about checking for DST being on or off depending on the time of year.

I do wonder if you would mind pasting an explanation of your code on the thread. I need it to know where to modify the code to show only the top 5 entries based on the number of visits during the range I have specified. When I went to try and find the array that is creating the output I really couldn't understand the logic behind it.

Also, this is such a complete thread, to have the code explained, I am sure it will help others greatly that want to use or piggy back off of what has been accomplished.

if (( $# < 3 || $# > 4 ))
then
   printf "Usage: $0 from_date from_time [to_date] to_time\n" >&2
   exit 2
fi

This just prints the usage string and terminates with exit code 2 if the number of passed arguments is not 3 or 4.

FDAY=$1
FTIME=$2

if (( $# == 3 ))
then
    TDAY=$FDAY
    TTIME=$3
else
    TDAY=$4
    TTIME=$3
fi

This sets FDAY,FTIME and TDAY,TTIME from the passed arguments, when 3 arguments are passed TDAY defaults to FDAY

FROM=$(date -d "$FDAY $FTIME" +%s)
(($? != 0 )) && exit 3
TO=$(date -d "$TDAY $TTIME" +%s)
(($? != 0 )) && exit 4

Calculate FROM and TO as seconds from epoch (midnight 1/1/1970). Here we allow any error messages from date to be displayed and exit with 3 in the case of an invalid from date/time or 4 for to date/time.

if (( $# == 3 && TO < FROM ))
then
   #FROM time later that TO time so add a day
   (( TO+=3600*24))
fi

Here if 3 arguments and TO time is earlier than FROM time (eg 9pm to 1am) make the TO date the next day. 3600 is seconds in 1 hour, multiply by 24 gives 1 day worth of seconds. Remember these dates are seconds passed epoch date.

if (( TO < FROM ))
then
    echo "$0: FROM date must be before TO date" >&2
    exit 5
fi

Trap error where TO date is before FROM and exit with 5.

echo "Examining from $(date -d @$FROM) ($FROM)"
echo "            to $(date -d @$TO) ($TO)"
echo

Display confirmation that the calculated dates match what was requested. This is quite usefull as date can accept strings like "today" or "yesterday" and it's good to be specific about the range going to be checked.

gawk -v F=$FROM -v T=$TO -v debug=0 '

Using GNU awk, this is needed as time/date functions are not supported in standard awk.
Pass shell $FROM in as variable F and $TO as variable. Variable debug set to 0 for false (non zero is true).

debug{for(i=1;i<=NF;i++) printf "$%d=%s\n", i, $i }

This debug outpus each field awk has split from the input file.

FNR==1 {
    for(ip in C) printf "%7d %s\n", C[ip], ip
    delete C
    print "Processing " FILENAME " file"
}

If processing the first record for a file output the contents of the C[] array from the previous file. The %7d format ensures 7 digit right justified printing.
Note: This is also done in the END block to get counts for last file processed.

$5 ~ "-0[45]00]" {
Only process rows where field number 5 is "-0400" or "-0500". This skips records from other timezones or non-valid log lines (eg headers or other record types).

split($4,v,"[[/: ]")
Split field 4 into variable V using left-square-bracket,colon,space or slash as word separators so:

[02/Jul/2015:01:55:59
gives
v[1]=
v[2]=02
v[3]=Jul
v[4]=2015
v[5]=01
v[6]=55
v[7]=59

mnum=index("xxJanFebMarAprMayJunJulAugSepOctNovDec", v[3])/3
Calculate month number from 3 char short month name. Index returns ordinal position in string of v[3]
so Jul gives 21. Once we divide by 3 we get the correct month number (eg Sep=9 Dec=12)

tm=mktime(v[4] " " mnum " " v[2] " " v[5] " " v[6] " " v[7])
mktime() requires string with "YYYY MM DD HH MM SS" format. Note if mnum has a invalid value like 0 or 1.33333, or the date is invalid in some other way (eg 30 Feb 2015) mktime returns -1, which will not be between the F and T values so nothing will be counted.

if (tm >= F && tm <= T) C[$1]++
else if(debug) print tm " not between " F " and " T

if tm is between FROM and TO increment the C[] array. This is the crux of the counting of ip addresses.
The C[] array array will use IP address (field $1) as the index and count as the value so it ends up like this:

C[192.168.0.20]=208
C[203.22.200.1]=15
C[215.215.215.215]=1051

To get the top 5 by count, you could use the GNU awk ordered arrays feature and only print the first 5 records. But in this case as it's only done after each file is processed it is much easier and still fairly efficient to use the external unix sort and head functions like this:

for(ip in C) printf "%7d %s\n", C[ip], ip | "sort -k1,1rn | head -5"
close("sort -k1,1rn | head -5")

Sort using first field -k1,1 with reverse order r numeric n sorting, head -5 for top 5
Note these 2 lines need to be in both the END and FNR==1 blocks, as a replacement for the existing for(... line

1 Like

This is great! Thanks so much for doing that. This will be a great help to me and I am sure it will be a future help to readers also.

Note that you could also put the sort and tail pipeline in the shell instead of doing it inside awk by changing the last line of the gawk script from:

END {for(ip in C) printf "%7d %s\n", C[ip], ip}' $FILES

to:

END {for(ip in C) printf "%7d %s\n", C[ip], ip}' $FILES | sort -k1,1rn | tail -5

Doing it this way should be slightly faster since gawk won't have to spawn an additional shell to run the pipeline.

XXXXXXXXXXXXXXXXXXXX

Please ignore the above suggestion; I lost track of some of the changes that had been made since I last touched this problem and confused this thread with a problem I was working on for another thread.

I agree doing it outside of gawk should be much faster and easier to implement. However, I am getting less output when I change the code to what you suggested. This would be OK except I used the same range I had before so I know there were several ip's that matched the criteria before. Whereas, it is only showing 1 that matches.

So, I will have to do some checking on the syntax to see what is going on.

Nevertheless, I appreciate you getting me pointed in the right direction.

I apologize. I was working on an awk problem in another thread and forgot how the code in this thread worked. Please ignore my suggestion in post #36 in this thread.

I sure appreciate you letting me know to not continue down that path for my answer.