Extract data from log file within specified time

So, we have a script, that is supposed to have a couple of functions like showing number of failed connections, recieved bytes per IP-address, and so on. We are supposed to be able to limit the number of results to either 0-24 hours or X days back from the last data in the log file.

Everything is working, except we dont know how to limit searches within a given timespace.

Our code looks like this:

#!/bin/sh

#-n: Limit the number of results to N
#-h: Limit the query to the last number of hours (< 24)
#-d: Limit the query to the last number of days (counting from
#midnight)
#-c: Which IP address makes the most number of connection attempts?
#-2: Which address makes the most number of successful attempts?
#-r: What are the most common results codes and where do they come
#from?
#-F: What are the most common result codes that indicate failure (no
#auth, not found etc) and where do they come from?
#-t: Which IP number get the most bytes sent to them?

#<filename> refers to the logfile. If '-' is given as a filename, or
#no filename is given, then standard input should be read. This
#enables the script to be used in a pipeline.

FILENAME=*.log
MAXSHOW=99999
LIMITHOURS=0
LIMITDAYS=0
h=1
c=0
b=0
r=0
F=0
t=0
while getopts :n:h:d:c2rFt option
do    
    case $option in
    n)
        MAXSHOW=$OPTARG
        ;;
    h)
        LIMITHOURS=$OPTARG
        ;;
    d)
        LIMITDAYS=$OPTARG
        ;;
    c)
        c=1
            ;;
    2)
        b=1
        ;;
    r)
        r=1
        ;;
    F)
        F=1
        ;;
    t)
        t=1
        ;;
    esac
done
if [ "$h" -eq "1" ]; then
   #?????
fi

if [ "$d" -eq "1" ]; then
#??????
fi

if [ "$c" -eq "1" ]; then
    cat $FILENAME|awk '{print $1}' |sort|uniq -c|sort -k 1 -n -r|head -$MAXSHOW
fi

if [ "$b" -eq "1" ]; then
    grep -Eo "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}.* 200" $FILENAME|awk '{print $1}'|sort|uniq -c|sort -nr|head -$MAXSHOW
fi

if [ "$r" -eq "1" ]; then
    cat $FILENAME|awk '{print $1" "$9}'|sort|uniq -c|sort -nr|head -$MAXSHOW
fi

if [ "$F" -eq "1" ]; then
    cat $FILENAME | if $9 > "200" ; then
        awk '{print $1" "$9}' |sort|uniq -c|sort -nr|head -$MAXSHOW
    fi
fi

if [ "$t" -eq "1" ]; then
cat $FILENAME |awk '{print $1" "$10}'|awk '{ x[$1]+=$2 } END{for(data in x) print data, x[data]}' | sort -k2,2 -nr|head -$MAXSHOW
fi
    

And our log file is full of data like this:

213.46.27.204 - - [01/Jan/2003:12:55:20 +0100] "GET /scripts/..%%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 - "" "" 
213.46.27.204 - - [01/Jan/2003:12:55:20 +0100] "GET /scripts/..%%35c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 - "" "" 
213.46.27.204 - - [01/Jan/2003:12:55:20 +0100] "GET /scripts/..%25%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 - "" "" 
213.46.27.204 - - [01/Jan/2003:12:55:21 +0100] "GET /scripts/..%252f../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 - "" ""

If anyone knows how we can fix this problem, we would be very thankful!

All those cats are completely useless, see useless use of cat award. They are particularly harmful on Windows, whose scheduler deals badly with too many short-lived programs.

If you rearrange your dates into YYYY/MM/DD HH:MM:SS order they easily compare alphabetically, which is why the rest of the world has converted to this order. You can do this rearranging inside awk, though a lookup table of month names is needed.

awk -F"[ \t/:\\\[\\\]]+" 'BEGIN { split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", A); for(X in A) M[A[X]]=sprintf("%02d", X) }
# Create timestamp in YYYY/MM/DD:HH:MM:SS so it can be compared to TSTART and TEND, which controls printing of lines
{ T=$6"/"M[$5]"/"$4":"$7":"$8":"$9 } (T >= TSTART) && (T <= TEND)' TSTART="2003/01/01:12:55:20" TEND="2003/01/01:12:55:20" inputfile

If awk doesn't work, try gawk or nawk.

The thing is we are supposed to input for example: ./script.sh -h 5 -c or ./script.sh -d 4 -c and our output is then supposed to be the IP-addresses most connected during the past 5 hours or past 4 days.

We should be able to use -h or -d together with the other flags.

Thanks for telling me that now.

Do you have GNU date? It can calculate time offsets and print date stamps like I do with awk there.

awk ... TSTART=$(date -d "- 5 day" +"%Y/%m/%d:%H:%M:%S") TEND=$(date +"%Y/%m/%d:%H:%M:%S") filename