How to use xargs to repeat as a loop to grab date string?

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted!

  1. The problem statement, all variables and given/known data:
    My goal to find how many requests in 14 days from weblog server. I know to cat a weblog file to wc -l to find the total of all requests have send to server. From there on, I can use a loop of 14 days to go through each line and do comparison with Date String until the string are different I would increment the date until it reach 14th date. I was thinking to seq 14 and xargs commands to go through 14 days cycle.
    I would like to know how to do a string compare between date?

Thanks in advance,

Scopiop

  1. Relevant commands, code, scripts, algorithms:
192.192.1.1 - - [10/June/2013...]
192.192.1.2 - - [10/June/2013...]
192.192.1.3 - - [11/June/2013..]
192.192.1.4 - - [12/June/2013..]
........

output

The last 14 day's requests are: 100
  1. The attempts at a solution (include all code and scripts):

  2. Complete Name of School (University), City (State), Country, Name of Professor, and Course Number (Link to Course):
    City College/Aaron Brick/160B

Note: Without school/professor/course information, you will be banned if you post here! You must complete the entire template (not just parts of it).

I don't think xargs is quite applicable here.

What do the lines really look like? Obscure what you must, but as is they're a little too generic.

79.114.31.152 - - [10/Jun/2013:07:43:07 -0700] "GET /~otangdec/cnit132/images/chic_dance.gif HTTP/1.1" 200 556572 "http://filelist.ro/details.php?id=23714" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36"
220.181.108.175 - - [10/Jun/2013:07:43:12 -0700] "GET / HTTP/1.1" 200 96 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
107.198.90.65 - - [10/Jun/2013:07:43:13 -0700] "GET /~otangdec/cnit132/images/chic_dance.gif HTTP/1.1" 200 556572 "http://us-mg6.mail.yahoo.com/neo/launch?.rand=39q9ldbqb22hh" "Mozilla/5.0 (Windows NT 6.0; rv:21.0) Gecko/20100101 Firefox/21.0"
93.104.214.107 - - [10/Jun/2013:07:43:16 -0700] "GET /~otangdec/cnit132/guestbook.html HTTP/1.1" 200 12054649 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)"
93.104.214.107 - - [10/Jun/2013:07:43:19 -0700] "GET /~otangdec/cnit132/addguest.html HTTP/1.1" 200 3510 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)"
93.104.214.107 - - [10/Jun/2013:07:43:19 -0700] "POST /~otangdec/cnit132/guestbook.pl HTTP/1.1" 200 979 "http://hills.ccsf.cc.ca.us/~otangdec/cnit132/addguest.html" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)"
66.249.73.225 - - [10/Jun/2013:07:43:22 -0700] "GET /robots.txt HTTP/1.1" 404 208 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.73.225 - - [10/Jun/2013:07:43:23 -0700] "GET /~cking1/cnit132/hw7.html HTTP/1.1" 200 7052 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
178.238.236.213 - - [10/Jun/2013:07:43:06 -0700] "GET /~otangdec/cnit132/guestbook.html HTTP/1.1" 200 12054649 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)"
178.238.236.213 - - [10/Jun/2013:07:43:40 -0700] "GET /~otangdec/cnit132/addguest.html HTTP/1.1" 200 3510 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)"

I want to do a string compare between 10 vs 11 to keep track 14 days cycle

 [10/Jun/2013:23:59:09]
 [11/Jun/2013:00:00:18]

I want to do a string compare between [

Easier said than done!

I'm deliberately posting the following command for various reasons:

  1. To prove that OP's idea might work under certain/ideal circumstances (non-strict accuracy eg. (roughly) to a day, it's the second half of the month etc.)
  2. To show the OP that the logic is flawed
  3. The professor won't accept this anyway
seq 24 -1 10 | xargs -I '{}' grep -e '\[{}/Nov/2014:' access.log

This works and it's counting lines for timestamps from 24/Nov/2014 to 10/Nov/2014 (last 14 days).
Now imagine it's Nov 5 ...

Following hints should help you to do it right and to gain an accuracy to a second:

  1. Get the last line of access.log (newest entry)
  2. Cut the date-time-timezone stamp ( 10/Jun/2013:07:43:40 -0700 )
  3. Reformat above string to something you can feed to GNU date, e.g. 2013-06-10 07:43:40 OR 2013/06/10 07:43:40 OR 20130610 07:43:40 OR whatever
  4. Using GNU date, convert the reformatted string to seconds since 1970-01-01
    4.1 Subtract 60�60�24�14 seconds (14 days) from the result and put the result in a variable, eg. twoweeksago
  5. In a while loop you then parse the access.log file line by line
    6.1. You repeat the steps 2, 3 and 4 here
    6.2. Pseudo code: if seconds greater than/equal to twoweeksago then increment counter
  6. echo "$counter requests"

Hope this helps.

1 Like

If you rearrange the date into YYYY/MM/DD HH:MM:SS, it sorts alphabetically and can be compared by simple > >= < <= in awk.

$ cat dcomp.awk

BEGIN {
        FS="[ \t\\[\\]/:]+"
        split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", A);
        for(X in A)     MON[A[X]]=sprintf("%02d", X);
}

function ymd() {
        return(sprintf("%s/%s/%s %02d:%02d:%02d", $6, MON[$5], $4, $7, $8, $9));
}

(ymd() >= YYYYMMDD" 00:00:00") && (ymd() <= YYYYMMDD" 23:59:59") { C++; }

END {   print C+0       }

$ awk -f dcomp.awk YYYYMMDD="2013/06/10" datafile

10

$
1 Like

I have tried the approach to grep the date of the last line and cut out the Date field and I have problem of getting rid of the square bracket.

$cat /etc/httpd/logs/access_log | cut -d' ' -f4 
> [25/Nov/2014:12:00:01

How do I remove the square bracket from most left? I have tried to use grep but
Unsuccessful to get rid of the bracket. After removed the bracket I think I can use
date "14 days ago" to set a stop point where I can count for all requests between those
date. I hope I got the right approach from here.

Thank you for your help,
Scopiop

You don't need to cat the whole file to get the last line. Try

tail -n1 /etc/httpd/logs/access_log | cut -d' ' -f4

It will retrieve only the very last line and apply the cut command to it.

You could simply add | cut -c2- to the above command. It will cut from the second character on until the end of the line, thus removing the square bracket.

In a loop, the above really should be done by making use of bash's built-ins for better performance.

       ${parameter#word}
       ${parameter##word}
              Remove matching prefix pattern.

       ${parameter%word}
       ${parameter%%word}
              Remove matching suffix pattern.

Example:

$ dat=$(tail -n1 access.log)
$ echo "$dat"
127.0.0.1 - - [24/Nov/2014:07:13:55 -0800] "GET /something.php HTTP/1.1" 200 315 "-" "Foo browser/1.0 (<some operating system>) <some user agent>"
$ #remove everything up to the first square bracket
$ dat="${dat#*\[}"
$ echo "$dat"
24/Nov/2014:07:13:55 -0800] "GET /something.php HTTP/1.1" 200 315 "-" "Foo browser/1.0 (<some operating system>) <some user agent>"
$ #remove everything from the first space until the end of string/line
$ dat="${dat%% *}"
$ echo "$dat"
24/Nov/2014:07:13:55
$

Yes, you can. Consult this wiki page for the exact strftime format (https://en.wikipedia.org/wiki/Common\_Log_Format\), but I'm not sure how you plan to do the (exact) comparison.

We are not here to do your homework for you. Leaving section 3 in the homework template blank isn't a good starting point.

If you put in some effort and show us what you have tried, we can help you hone your solution to one that will get the job done. If you aren't willing to show us that you are making an effort, there isn't much more that we will be willing to do for you.

All that you have shown us is that you can use cut with space as a delimiter to select part of an input line. Could you use cut with a different delimiter to get rid of the [ ?

What tools are you allowed to use for this assignment? You have shown us cut and we assume that you can use some shell (but we don't know which shell). You have suggestions to use GNU date and awk . Are these available for this assignment? Can you use a 1993 or later version of ksh (which can also perform simple calculations to get the date 10 days ago)?

Are you looking for a date 14 days ago to the second? Or, are you looking for a date starting at midnight almost 14 days ago? Whether or not you can use the date utility depends on what operating system and shell you're using?

Thank you for your helps.

I have thought about how to approach how to write this program but did not knowing how the log system work. There fore, I ask some questions here may offend the integrity of the forum.

First I thought if I knew how to manipulate the date string then I can control the log count of every 14 days. So I wonder how to grep things and strip things off. I had problem with bracket attach to date format such as

[15/Nov/��.

From the last lecture we have learned using Tr command I found it is useful to apply on how to remove the bracket.

I realized the easiest way to get the total requests of every 14 days
is find the line number of every break point of 14 days�� 0, 14, 24, 42 etc.

Pseudo:
Find the last line of the most current time in log file
Find the line number of request 14 days ago ("+%d/%b/%Y:%H:%M")
Then subtract the different to get the total number request of 14 days ago.

Thank you again,

Scopiop