Filtering log file with lines older than 10 days.

Hi,
I am trying to compare epoch time in a huge log file (2 million lines) with todays date. I have to create two files one which has lines older than 10 days and another file with less than 10 days. I am using while do but it takes forever to complete the script. It would be helpful if you can help me convert this into awk format to make it faster.

Input file field 2 has epoch time up to nano seconds.

INPUT

/kshdmjv/1/1/        1508426244789202297     101434  407896064       409265492  48817   BB      408811357       214089175
/PCD0289/0lshdmk6/1/1        1508426298102701719     101435  3744464896      3756961760 445770  KK      3756575209      1934910527
/PCD0289/0mshdml0/1/1        1508426323091171565     101436  3749707776      3762220364 446331  LL      3761465177      1943293261

Current code

#!/bin/bash
NOWT=`expr $(date +%s%3N) / 1000`
echo -e " Time now is $NOWT"
while read line ; do
FST=$(echo $line |awk '{print $1}');
SND=$(echo $line |awk '{print $2}');
RST=$(echo $line |awk '{ print substr($0, index($0,$3)) }');
OLDT=$((SND/1000000000))
DIFT=$(expr ${NOWT} - ${OLDT});

   T=$DIFT
   D=$((T/60/60/24))
   H=$((T/60/60%24))
   M=$((T/60%60))
   S=$((T%60))
   if [[ ${D} -gt 10 ]]
   then
       printf '%d days %02d:%02d:%02d %s,%d,%s\n' $D $H $M $S $FST $SND $RST >> olderthn10days.txt
   else
       printf '%d dayless %s,%d,%s\n' $D $FST $SND $RST >> lessthn10days.txt
   fi
done < 0108.txt

Please advise if there is any way to make this faster and better.

If there are 2 million lines this might help reduce 6 million subprocesses.

#!/bin/bash
NOWT=$(expr $(date +%s%3N) / 1000)
echo -e " Time now is $NOWT"
while read FST SND RST ; do
OLDT=$((SND/1000000000))
DIFT=$(expr ${NOWT} - ${OLDT});

   T=$DIFT
   D=$((T/60/60/24))
   H=$((T/60/60%24))
   M=$((T/60%60))
   S=$((T%60))
   if [[ ${D} -gt 10 ]]
   then
       printf '%d days %02d:%02d:%02d %s,%d,%s\n' $D $H $M $S $FST $SND $RST >> olderthn10days.txt
   else
       printf '%d dayless %s,%d,%s\n' $D $FST $SND $RST >> lessthn10days.txt
   fi
done < 0108.txt

But I cannot see that the printf s are working correctly as $RST will have embedded spaces in it.

If you are using bash4 you could optionally replace the line

NOWT=$(expr $(date +%s%3N) / 1000)

with

NOWT=$(expr $(printf "%(%s%3N)T" -2) / 1000)

Andrew

Ten days to the nanosecond - please allow for a reasonable, coarser granularity. And, you should be aware that the result depends on the time of day that you run the script unless you define to use e.g. "today midnight" for the comparison...
Try

awk 'BEGIN {D10 = (srand()-864000); split ("OLDER YOUNGER", FN)} {print $0 > (FN[1+($2/1E9 > D10)])} ' file

You may want to add the desired conversions of the fields as given above.

Hi RudiC,

I tried your awk but I don't get any O/P.

panz:/tmp> awk 'BEGIN {D10 = (srand()-864000); split ("OLDER YOUNGER", FN)} {print $0 > (FN[1+($2/1E9 > D10)])} ' 0108.txt

---------- Post updated at 10:12 AM ---------- Previous update was at 10:10 AM ----------

We can compare it to Today 12:00AM if that makes it easier.

Sure? Didn't it create the two desired output files?

Somehow...It creates only one file YOUNGER and dumps everything in that file.

---------- Post updated at 01:48 PM ---------- Previous update was at 01:38 PM ----------

Here is sample file if you like

/ECD0303/12s3tbqe/1       1494450414791890180     4566    34359738368     34475924324     4144773 4136493 116784870       115898019
/ECD0303/10s3tbqd/1       1494450429920112522     4567    34338766848     34454780244     4138613 4130341 116576831       115660526
/ECD0303/11s3tbqe/1       1494450399980138278     4568    34359738368     34476107528     4151316 4142956 117616816       116138618
/ECD0303/13s3tbqv/1       1494450274931912967     4569    4194304 4209372 503     AA      3986663 1398647
/ECD0303/14s3tbqv/1       1494450274985871012     4570    7340032 7365436 871     CC      6839625 2791870
/ECD0303/15s3tbqv/1       1494450274997690312     4571    3145728 3157044 369     FC      3041330 1124246
/ECD0303/16s3tbr1/1       1494450283497270749     4572    458227712       459733468       53679   3D      459428840       182225614
/ECD0303/17s3tbr2/1       1494450278445573393     4573    3145728 3156848 362     2C      2983292 1113543
/ECD0303/18s3tbr4/1       1494450288802435061     4574    731906048       734338440       86736   3E      733796743       322044566
/ECD0303/1as3tbuv/1       1494450584139209737     4576    34359738368     34476011096     4147872 3528559 5261036982      1944503212
/exd/doc/3/45 1509129209233091543     72      31958192128     32064917336     3807181 A       32064917336     19422111111
/exd/doc/3/46 1509129209229341353     73      31516553216     31621961332     3760202 B       31621961332     16479692952
/exd/doc/3/47 1509129209242579870     74      30883958784     30995567000     3687020 0FA     30987341948     21703509926
/exd/doc/3/48 1509129209238274317     75      30905245696     31008440092     3681224 0       31008440092     20189779623
/exd/doc/3/49 1509129209246929594     76      28803964928     28906429400     3440802 A0B     28900446928     22970462639
/exd/doc/3/50 1509129209230745672     77      28351578112     28446458816     3384655 0       28446458816     18756473322
/exd/doc/3/51 1509129209240203414     78      786698240       791016112       94114   EDC     789340848       390977698

In your sample data in post#1, NO timestamp is younger than 10 days ago. For testing purposes, I modified one of them, and it worked as desired.
You may want to reverse the comparison operator to test the result...

EDIT: For your new sample, it works briliiantly: the lines with 10. May go to OLDER, the ones from 27.Oct. go to YOUNGER.

EDIT2: What's your awk version? Please post result of

awk 'BEGIN {print srand()}'
1509649848
panz:/tmp> awk 'BEGIN {print srand()}'
1
You have new mail in /var/spool/mail/root
panz:/tmp>

OK, there's the error. Try

awk -vDT=$(date +%s) 'BEGIN {D10 = (DT-864000); split ("OLDER YOUNGER", FN)} {print $0 ">" (FN[1+($2/1E9 > D10)])} ' file
1 Like

Hi RudiC,

This is throws o/p but does not create a file as expected. This one is awesome. but would you be able to put this o/p in files please. Thank you for all your help.

Sorry, I forgot to remove the "test setup" - remove the double quotes around the redirection operator.

1 Like

:b:That works like a champ, Thank you RudiC for your help!!:b:

---------- Post updated at 09:25 AM ---------- Previous update was at 09:23 AM ----------

Also, I would like to learn more about FN and split part of your script. Please suggest if any documentation.

FN is a variable to hold the file name array used in the "conditional" redirection... For the split command, see man awk and references therein.