sed/awk date range?

Epx998 · October 17, 2011, 1:30pm

Hi,

I am trying to grep out a date range in an access log file. I defined the date like so;

DATE1=$(date --date '1 hour ago' '+%m/%d/%y:%H:%M:%S')
DATE2=$(date '+%m/%d/%y:%H:%M:%S')

Then I just used cat to get the hits to the url into a results.txt;

touch /tmp/results.txt
cat /var/log/httpd/access_log | grep index.php >> /tmp/results.txt

How would I use sed/awk to get the exact entries for the date ranges that were defined?

Thanks for any help.

Cheers!

ctsgnb · October 17, 2011, 1:33pm

see thread :

http://www.unix.com/shell-programming-scripting/169307-reading-lines-file-between-two-search-patterns.html\#post302565304

Epx998 · October 17, 2011, 1:56pm

I saw that post as well, but when I try what is suggested, I just get an empty tmp.log, there should be at least a few lines.

Here is the script I wrote;

date1=$(date --date '1 hour ago' '+%m/%d/%y:%H:%M:%S')
date2=$(date '+%m/%d/%y:%H:%M:%S')

cat /var/log/httpd/access_log | grep index.php >> results.txt

awk -v d1="${date1}" -v d2="${date2}" '$0~d1{p=1} $0~d2{p=0} p' results.txt >> tmp.log

ctsgnb · October 17, 2011, 2:04pm

How does your results.txt look like ?

Give us a cat ...

Epx998 · October 17, 2011, 5:52pm

results.txt is just the grep'd access_log for apache on my proof of concept VM;

127.0.0.1 - - [17/Oct/2011:12:06:15 -0700] "GET /cacti/include/main.css HTTP/1.1" 304 - "http://localhost/cacti/index.php" "Opera/9.80 (X11; Linux x86_64; U; en) Presto/2.9.168 Version/11.51"
127.0.0.1 - - [17/Oct/2011:12:06:15 -0700] "GET /cacti/images/favicon.ico HTTP/1.1" 304 - "http://localhost/cacti/index.php" "Opera/9.80 (X11; Linux x86_64; U; en) Presto/2.9.168 Version/11.51"
127.0.0.1 - - [17/Oct/2011:12:06:15 -0700] "GET /cacti/include/layout.js HTTP/1.1" 304 - "http://localhost/cacti/index.php" "Opera/9.80 (X11; Linux x86_64; U; en) Presto/2.9.168 Version/11.51"
127.0.0.1 - - [17/Oct/2011:12:06:15 -0700] "GET /cacti/images/shadow_gray.gif HTTP/1.1" 304 - "http://localhost/cacti/index.php" "Opera/9.80 (X11; Linux x86_64; U; en) Presto/2.9.168 Version/11.51"

From there I need to just get the entries over the last hour. So, do I have to use awk and filter everything but the numbers out, then use egrep to get the correct range and get the line count from that?

---------- Post updated at 02:52 PM ---------- Previous update was at 01:08 PM ----------

From what I have been reading, I would have to convert the date to be fully numeric, then sed would work nicely to get a range. Not sure how I can covert the log file, adjusting the httpd.conf logging format isn't an option.

Suggestions?

vgersh99 · October 17, 2011, 6:30pm

something to start with working on your 'grep-ed' file sample:
nawk -f epx.awk myGreppedLogFile
epx.awk:

BEGIN {
 FS="[[ ]"
 mon="JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC"
   monN=split(mon, monA, "|");
   for(i=1; i<=monN; i++) {
     monA[monA]=i;
     delete monA;
   }
}
{
    n=split($5,a, "[/:]")
    printf("%s ->[%s%02d%02d%s%s%s]\n", $5, a[3], monA[toupper(a[2])], a[1], a[4], a[5], a[6])
}

You don't need sed/grep - do it all natively in awk.

Epx998 · October 17, 2011, 10:55pm

Than you very much. That converted the dates nicely. What do you suggest for getting the entries of the last hour? Current time, going back 60 minutes. I used date to mimic the format and going back 1 hour. I tried using sed but it returns 0.

sed -n '/$DATE1/,/$DATE2/p' output.log | wc -l

That look right?

Chubler_XL · October 18, 2011, 12:53am

@vgersh99

Clever code:

 
mon="JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC"
   monN=split(mon, monA, "|");
   for(i=1; i<=monN; i++) {
     monA[monA]=i;
     delete monA;
   }
}

But can't help thinking that something like this is probably more readable, and not that much bigger (191 chars vrs 163 chars).

monA["JAN"]= 1; monA["FEB"]= 2; monA["MAR"] = 3
monA["APR"]= 4; monA["MAY"]= 5; monA["JUN"] = 6
monA["JUL"]= 7; monA["AUG"]= 8; monA["SEP"] = 9
monA["OCT"]=10; monA["NOV"]=11; monA["DEC"] =12