Remove lines older than 30 days

Hi Experts/Gurus,

Is there a way to remove lines in a file that are older than x days (i.e. 30 days) based on the date stamp in the first column?

Example.

$ date
Sat Jan 11 14:12:06 EDT 2014
$cat sample.txt
10-10-2013 09:00:01 AM|Line test 1234567
16-10-2013 08:30:00 AM|Line test 2345567
25-10-2013 07:30:00 AM|Line test 3456678
26-10-2013 06:00:00 AM|Line test 4567890
28-10-2013 07:30:00 AM|Line test 5678910
15-12-2013 10:00:00 PM|Line test 7891234
10-01-2014 03:00:00 PM|Line test 8901234

expected output:

$cat sample1.txt
15-12-2013 10:00:00 PM|Line test 7891234
10-01-2014 03:00:00 PM|Line test 8901234

Thank you in advance!

What is your OS? Post your system information:

uname -a

hi yoda - OS is aix 6.1

I wrote and tested this on a AIX 6.1 box and it works, however you need to have Perl installed. Hope this helps.

#!/usr/bin/ksh
#
# -- script to remove lines from a file
# -- older than the given number of days
# -- based on the date in column 1
#

# check command line for given number of days
if [ $# -ne 2 ]
then
    echo "Usage: $0 </path/to/file> <number of days>"
    exit 1
fi

# store file and number of days
f=$1
d=$2

# get the current date in seconds past epoch
c=$(date +%s)

# new file
n="$f.new"

# read file line by line, convert the date in
# column one to seconds pase epoch and check if
# older than the given number of days
while read line
do
    c1=$(echo $line | awk '{print $1}')
    f1=$(echo "$c1" | awk '{gsub(/-/," ",$0)}{print $3" "$2" "$1}')
    secs=$(perl -e 'use Time::Local; print timelocal(0,0,0,$ARGV[2], $ARGV[1]-1, $ARGV[0]);' $f1)
    dif=$(echo "scale=0; ($c - $secs) / (24*3600)" | bc)
    if [ $dif -le $d ]
    then
        echo $line >> $n
    fi
done < $f

# remove backup original file and rename new file
mv $f $f.ORIG
mv $n $f
cat $f

# done
exit 0

./remoldlines.ksh
Usage: ./remoldlines.ksh </path/to/file> <number of days>

./remOldLines.ksh /tmp/file.txt 30
15-12-2013 10:00:00 PM|Line test 7891234
10-01-2014 03:00:00 PM|Line test 8901234

da3fb00cca3575cbce2516f73a486294

Ok, take a look at this thread: Date Arithmetic in FAQ section.

Calling perl and bc once and awk twice for every line of the datafile may be quite slow especially with large files.

Most of the processing can be done within perl like this:

$ cat remoldlines.pl
#!/usr/bin/env perl
use Time::Local;

while (my $ln = <STDIN>) {
    my ($d, $mt, $y, $h, $m, $s, $pm) = split(/[ :-]/, $ln);
    $h += 12 unless ($pm != "PM");
    my $age = time() - timelocal($s,$m,$h,$d,$mt-1,$y);
    print $ln unless ($age > int($ARGV[0])*3600*24);
}

$ remoldlines.pl 30 < sample.txt
20-12-2013 10:00:00 PM|Line test 7891234
15-01-2014 03:00:00 PM|Line test 8901234

A python:

#!/usr/bin/env python

import sys
from time import mktime,strptime 
from datetime import datetime

now  = datetime.now()
with open('sample.txt','rb') as f:
    for line in f:
        L = line.split('|')
        before = datetime.fromtimestamp(
                 mktime(strptime(L[0],'%d-%m-%Y %I:%M:%S %p')))
        old = now - before
        if  old.days < 30:
            sys.stdout.write(line)