extract information from a log file (last days)

I'm still new to bash script , I have a log file and I want to extract the items within the last 5 days . and also within the last 10 hours

the log file is like this : it has 14000 items started from march 2002 to january 2003

[31/Mar/2002:19:30:41
.
.
.
[01/Jan/2003:12:55:15
[01/Jan/2003:12:55:16
[01/Jan/2003:12:55:16
[01/Jan/2003:12:55:16
[01/Jan/2003:12:55:17
[01/Jan/2003:12:55:17
[01/Jan/2003:12:55:18
[01/Jan/2003:12:55:18
[01/Jan/2003:12:55:18
[01/Jan/2003:12:55:19
[01/Jan/2003:12:55:19
[01/Jan/2003:12:55:20
[01/Jan/2003:12:55:20
[01/Jan/2003:12:55:20
[01/Jan/2003:12:55:21

is it possible to write it like this :

awk '{print $4}' < *.log |uniq -c|sort -g|tail -10

but still its not what I want

I am not sure what are you trying to get at here...as Jan 2003 isnt even the last 5 years and you want the last 5 days :confused:

I'm just practicing and I work on this log file as an example ! how the bash could shows the last x days of this log file ? the last X days that is available there not now !

Date math isn't trivial, many systems don't have easy ways to manipulate and compare dates from the shell. If you manipulate dates into YYYYMMDDHHMMSS order they can be compared alphabetically, but you can't do date arithmetic from that.

If you have GNU awk(usually found in Linux) this may be a starting point:

gawk -v EDATE="26/Oct/2002:21:02:19" 'BEGIN {
        # Set up arrays for name-to-monthnumber
        split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec", MON, "|");
        for(N=1; N<=12; N++) MNUM[MON[N]]=sprintf("%02d", N);

        # Split [DD/MON/YYYY:HH:MM:SS into DD MM YYYY HH MM SS stored in D[1]-D[6].
        split(EDATE, D, "[:/]");
        # Convert "YYYY MM DD HH MM SS" into epoch time, i.e. seconds since 1970
        EDATE=mktime(D[3] " " MNUM[D[2]] " " D[1] " " D[4] " " D[5] " " D[6]);
        # Starting date is 5 days earlier
        SDATE=EDATE-(60*60*24*5);
}

{
                # Need the \\[ in there to ignore the [ at the beginning of the line
                split($1, D, "[\\[:/]");
                DATE=mktime(D[4] " " MNUM[D[3]] " " D[2] " " D[5] " " D[6] " " D[7]);
                # Print the line if it falls in the correct range
                if((DATE >= SDATE) && (DATE <= EDATE)) print;
        }' < datafile

Try getting EDATE from the last lilne of the file, with tail -1

Can you post a sample of the logfile showing the exact format of the date and time stamp entries.

here the example of log file :

172.16.0.3 - - [31/Mar/2002:19:30:41 +0200]
127.0.0.1 - stefan [01/Apr/2002:12:17:23 +0200]
213.64.153.92 - - [26/Sep/2002:02:01:58 +0200]
213.97.240.226 - - [28/Sep/2002:03:50:58 +0200] 
213.64.214.124 - - [29/Sep/2002:09:56:04 +0200]
.......
213.46.27.204 - - [01/Jan/2003:12:55:21 +0100]

In that case:

gawk -v EDATE="26/Oct/2002:21:02:19" 'BEGIN {
        # Set up arrays for name-to-monthnumber
        split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec", MON, "|");
        for(N=1; N<=12; N++) MNUM[MON[N]]=sprintf("%02d", N);

        # Split [DD/MON/YYYY:HH:MM:SS into DD MM YYYY HH MM SS stored in D[1]-D[6].
        split(EDATE, D, "[:/]");
        # Convert "YYYY MM DD HH MM SS" into epoch time, i.e. seconds since 1970
        EDATE=mktime(D[3] " " MNUM[D[2]] " " D[1] " " D[4] " " D[5] " " D[6]);
        # Starting date is 5 days earlier
        SDATE=EDATE-(60*60*24*5);
}

{
                # Need the \\[ in there to ignore the [ at the beginning of the field
                split($(NF-1), D, "[\\[:/]");
                DATE=mktime(D[4] " " MNUM[D[3]] " " D[2] " " D[5] " " D[6] " " D[7]);
                # Print the line if it falls in the correct range
                if((DATE >= SDATE) && (DATE <= EDATE)) print;
        }' < datafile

gawk does not work !

line 1: gawk: command not found

I did wonder what your system was, but you never said. What is it?

Do you have perl on your system.

It's Mac OSX ...
I don't want to use perl ! simple shell script not perl or python or anything else !

I searched a lot but i could not find anything on the web related to what i want !
it's actually part of my assignments ! ( a little part )

So I should extract the last X hours/days from the log file

I repeat: What you're asking for isn't easy.

1) Date math is not easy. The only time it's easy is when someone else has done all the work for you. How does one subtract dates if they're not numbers? If the language can't do it for you, convert them the hard way into something the language can subtract. That means worrying about things like calendars and leap-years.

2) This isn't a database. There's no "query". There's no "SELECT X from Y WHERE ..." to select data you want from known datatypes, this is a text file with no datatypes except columns, maybe, if you're lucky. To get text out of it, you match text against text. The closest thing there is to 'select' for text is awk, the flatfile powertool, which organizes text into records and columns for you if you tell it how the textfile's laid out, and understands numbers.

If you have no nice, clean tools which do date math for you, you have to do it the hard way. Fortunately, you might not have been the first person in the world to do so. OSX's awk is not GNU awk, but it does at least support functions, so there's an alternative mktime() you can try:

awk -v EDATE="26/Oct/2002:21:02:19" '
function _tm_isleap(year,    ret)
{
    ret = (year % 4 == 0 && year % 100 != 0) ||
            (year % 400 == 0)

    return ret
}

function _tm_addup(a,    total, yearsecs, daysecs,
                         hoursecs, i, j)
{
    hoursecs = 60 * 60
    daysecs = 24 * hoursecs
    yearsecs = 365 * daysecs

    total = (a[1] - 1970) * yearsecs

    # extra day for leap years
    for (i = 1970; i < a[1]; i++)
        if (_tm_isleap(i))
            total += daysecs

    j = _tm_isleap(a[1])
    for (i = 1; i < a[2]; i++)
        total += _tm_months[j, i] * daysecs

    total += (a[3] - 1) * daysecs
    total += a[4] * hoursecs
    total += a[5] * 60
    total += a[6]

    return total
}

function mktime(str,    res1, res2, a, b, i, j, t, diff)
{
    i = split(str, a, " ")    # don't rely on FS

    if (i != 6)
        return -1

    # force numeric
    for (j in a)
        a[j] += 0

    # validate
    if (a[1] < 1970 ||
        a[2] < 1 || a[2] > 12 ||
        a[3] < 1 || a[3] > 31 ||
        a[4] < 0 || a[4] > 23 ||
        a[5] < 0 || a[5] > 59 ||
        a[6] < 0 || a[6] > 61 )
            return -1

    res1 = _tm_addup(a)
    t = strftime("%Y %m %d %H %M %S", res1)

    if (_tm_debug)
        printf("(%s) -> (%s)\n", str, t) > "/dev/stderr"

    split(t, b, " ")
    res2 = _tm_addup(b)

    diff = res1 - res2

    if (_tm_debug)
        printf("diff = %d seconds\n", diff) > "/dev/stderr"

    res1 += diff

    return res1
}

BEGIN {
        # Initialize data for mktime()
    # Initialize table of month lengths
    _tm_months[0,1] = _tm_months[1,1] = 31
    _tm_months[0,2] = 28; _tm_months[1,2] = 29
    _tm_months[0,3] = _tm_months[1,3] = 31
    _tm_months[0,4] = _tm_months[1,4] = 30
    _tm_months[0,5] = _tm_months[1,5] = 31
    _tm_months[0,6] = _tm_months[1,6] = 30
    _tm_months[0,7] = _tm_months[1,7] = 31
    _tm_months[0,8] = _tm_months[1,8] = 31
    _tm_months[0,9] = _tm_months[1,9] = 30
    _tm_months[0,10] = _tm_months[1,10] = 31
    _tm_months[0,11] = _tm_months[1,11] = 30
    _tm_months[0,12] = _tm_months[1,12] = 31

        # Set up arrays for name-to-monthnumber
        split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec", MON, "|");
        for(N=1; N<=12; N++) MNUM[MON[N]]=sprintf("%02d", N);

        # Split [DD/MON/YYYY:HH:MM:SS into DD MM YYYY HH MM SS stored in D[1]-D[6].
        split(EDATE, D, "[:/]");
        # Convert "YYYY MM DD HH MM SS" into epoch time, i.e. seconds since 1970
        EDATE=mktime(D[3] " " MNUM[D[2]] " " D[1] " " D[4] " " D[5] " " D[6]);
        # Starting date is 5 days earlier
        SDATE=EDATE-(60*60*24*5);
}

{
                # Need the \\[ in there to ignore the [ at the beginning of the field
                split($(NF-1), D, "[\\[:/]");
                DATE=mktime(D[4] " " MNUM[D[3]] " " D[2] " " D[5] " " D[6] " " D[7]);
                # Print the line if it falls in the correct range
                if((DATE >= SDATE) && (DATE <= EDATE)) print;
        }' < datafile

---------- Post updated at 01:04 PM ---------- Previous update was at 12:58 PM ----------

Wait, what? This is homework?

actually I wrote the assignment with python without any problem ! but I should write the whole thing with shell script too ! and i've done that except this part ! all other parts was not this much complicated ! So, I suppose there should be another way

I repeat for the third time: Date arithmetic is only easy when the language does it for you.

Date math is one of shell programming's blind spots. The enhancements GNU/Linux has for them mostly filled them in, but you only get them with the GNU utilities.

I also ask again: Is this homework?

yeah it's homework !

and still it has error

awk: syntax error at source line 38 in function mktime
context is
>>> <<<
awk: illegal statement at source line 38 in function mktime
missing }
./21: line 44: syntax error near unexpected token `('
./21: line 44: ` for (j in a)'