Epx998
October 17, 2011, 1:30pm
1
Hi,
I am trying to grep out a date range in an access log file. I defined the date like so;
DATE1=$(date --date '1 hour ago' '+%m/%d/%y:%H:%M:%S')
DATE2=$(date '+%m/%d/%y:%H:%M:%S')
Then I just used cat to get the hits to the url into a results.txt;
touch /tmp/results.txt
cat /var/log/httpd/access_log | grep index.php >> /tmp/results.txt
How would I use sed/awk to get the exact entries for the date ranges that were defined?
Thanks for any help.
Cheers!
ctsgnb
October 17, 2011, 1:33pm
2
Epx998
October 17, 2011, 1:56pm
3
I saw that post as well, but when I try what is suggested, I just get an empty tmp.log, there should be at least a few lines.
Here is the script I wrote;
date1=$(date --date '1 hour ago' '+%m/%d/%y:%H:%M:%S')
date2=$(date '+%m/%d/%y:%H:%M:%S')
cat /var/log/httpd/access_log | grep index.php >> results.txt
awk -v d1="${date1}" -v d2="${date2}" '$0~d1{p=1} $0~d2{p=0} p' results.txt >> tmp.log
ctsgnb
October 17, 2011, 2:04pm
4
How does your results.txt look like ?
Give us a cat ...
Epx998
October 17, 2011, 5:52pm
5
results.txt is just the grep'd access_log for apache on my proof of concept VM;
127.0.0.1 - - [17/Oct/2011:12:06:15 -0700] "GET /cacti/include/main.css HTTP/1.1" 304 - "http://localhost/cacti/index.php" "Opera/9.80 (X11; Linux x86_64; U; en) Presto/2.9.168 Version/11.51"
127.0.0.1 - - [17/Oct/2011:12:06:15 -0700] "GET /cacti/images/favicon.ico HTTP/1.1" 304 - "http://localhost/cacti/index.php" "Opera/9.80 (X11; Linux x86_64; U; en) Presto/2.9.168 Version/11.51"
127.0.0.1 - - [17/Oct/2011:12:06:15 -0700] "GET /cacti/include/layout.js HTTP/1.1" 304 - "http://localhost/cacti/index.php" "Opera/9.80 (X11; Linux x86_64; U; en) Presto/2.9.168 Version/11.51"
127.0.0.1 - - [17/Oct/2011:12:06:15 -0700] "GET /cacti/images/shadow_gray.gif HTTP/1.1" 304 - "http://localhost/cacti/index.php" "Opera/9.80 (X11; Linux x86_64; U; en) Presto/2.9.168 Version/11.51"
From there I need to just get the entries over the last hour. So, do I have to use awk and filter everything but the numbers out, then use egrep to get the correct range and get the line count from that?
---------- Post updated at 02:52 PM ---------- Previous update was at 01:08 PM ----------
From what I have been reading, I would have to convert the date to be fully numeric, then sed would work nicely to get a range. Not sure how I can covert the log file, adjusting the httpd.conf logging format isn't an option.
Suggestions?
something to start with working on your 'grep-ed' file sample:
nawk -f epx.awk myGreppedLogFile
epx.awk:
BEGIN {
FS="[[ ]"
mon="JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC"
monN=split(mon, monA, "|");
for(i=1; i<=monN; i++) {
monA[monA]=i;
delete monA;
}
}
{
n=split($5,a, "[/:]")
printf("%s ->[%s%02d%02d%s%s%s]\n", $5, a[3], monA[toupper(a[2])], a[1], a[4], a[5], a[6])
}
You don't need sed/grep - do it all natively in awk.
Epx998
October 17, 2011, 10:55pm
7
Than you very much. That converted the dates nicely. What do you suggest for getting the entries of the last hour? Current time, going back 60 minutes. I used date to mimic the format and going back 1 hour. I tried using sed but it returns 0.
sed -n '/$DATE1/,/$DATE2/p' output.log | wc -l
That look right?
@vgersh99
Clever code:
mon="JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC"
monN=split(mon, monA, "|");
for(i=1; i<=monN; i++) {
monA[monA]=i;
delete monA;
}
}
But can't help thinking that something like this is probably more readable, and not that much bigger (191 chars vrs 163 chars).
monA["JAN"]= 1; monA["FEB"]= 2; monA["MAR"] = 3
monA["APR"]= 4; monA["MAY"]= 5; monA["JUN"] = 6
monA["JUL"]= 7; monA["AUG"]= 8; monA["SEP"] = 9
monA["OCT"]=10; monA["NOV"]=11; monA["DEC"] =12