Hello. I'm not nearly good enough with awk/perl to create the logfile scraping script that my boss is insisting we need immediately. Here is a brief 3-line excerpt from the access.log file in question (actual URL domain changed to 'aaa.com'):
209.253.130.36 - - [23/Sep/2009:12:55:44 -0700] "GET /images/products/en_us/pc/detail/273595_dt.jpg HTTP/1.1" 200 28520 "http://www.aaa.com/product/holiday+parties/halloween+party+supplies.do?" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; FunWebProducts; .NET CLR 1.1.4322)" 22134 "__utma=8470452.136497171.1253643073.1253655989.1253731688.3; __utmb=8470452.4.10.1253731688; __utmz=8470452.1253643073.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); s_cc=true"
99.60.55.157 - - [23/Sep/2009:12:55:45 -0700] "GET /mod/productquickview/includes/themes/default.css HTTP/1.1" 200 767 "http://www.aaa.com/home.do?" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.14) Gecko/2009082707 Firefox/3.0.14 (.NET CLR 3.5.30729)" 14097 "customer=none; basket=none; __utma=8470452.1058319807.1252542208.1252547047.1252713609.3; __utmz=8470452.1252542208.1.1.utmcsr=yahoo|utmccn=(organic)|utmcmd=organic|utmctr=aaa; JSESSIONID=j0d7VJsXNBv6ztnpOp"
198.7.255.226 - - [23/Sep/2009:12:55:46 -0700] "GET /images/products/en_us/gateways/costumes_R_01_C_01.jpg HTTP/1.1" 200 30097 "http://www.aaa.com/category/costumes+%26+accessories.do" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.14) Gecko/2009082707 Firefox/3.0.14 (.NET CLR 3.5.30729)" 12334 "s_cc=true"
So the lines start with an IP-address, followed by date, and then time. We want to only search the last 10 minutes in the file (say if current time is 11:40, we want to only look at lines that go back to 11:30). I've got the code to convert the current time into scalar, subtract 600 secs, and store that time as single character variables (ie: $a = 1, $b = 1, $c = 3, $d = 0).
But I need help with an awk (or other?) code line that will parse each entry in the log file to skip over the IP and the date, and match against the TIMEstamp only. And what's more, we'd like it to do so starting from the bottom of the file (ie: with the most recent entry) and go backwards......and then hopefully stop the search when it hits the first entry that does NOT fall within the past 10-min (because log file is very, very large!).
Any and all help or suggestions would be monumentally appreciated.