shunya
November 2, 2017, 10:01am
1
Hi,
I am trying to compare epoch time in a huge log file (2 million lines) with todays date. I have to create two files one which has lines older than 10 days and another file with less than 10 days. I am using while do but it takes forever to complete the script. It would be helpful if you can help me convert this into awk format to make it faster.
Input file field 2 has epoch time up to nano seconds.
INPUT
/kshdmjv/1/1/ 1508426244789202297 101434 407896064 409265492 48817 BB 408811357 214089175
/PCD0289/0lshdmk6/1/1 1508426298102701719 101435 3744464896 3756961760 445770 KK 3756575209 1934910527
/PCD0289/0mshdml0/1/1 1508426323091171565 101436 3749707776 3762220364 446331 LL 3761465177 1943293261
Current code
#!/bin/bash
NOWT=`expr $(date +%s%3N) / 1000`
echo -e " Time now is $NOWT"
while read line ; do
FST=$(echo $line |awk '{print $1}');
SND=$(echo $line |awk '{print $2}');
RST=$(echo $line |awk '{ print substr($0, index($0,$3)) }');
OLDT=$((SND/1000000000))
DIFT=$(expr ${NOWT} - ${OLDT});
T=$DIFT
D=$((T/60/60/24))
H=$((T/60/60%24))
M=$((T/60%60))
S=$((T%60))
if [[ ${D} -gt 10 ]]
then
printf '%d days %02d:%02d:%02d %s,%d,%s\n' $D $H $M $S $FST $SND $RST >> olderthn10days.txt
else
printf '%d dayless %s,%d,%s\n' $D $FST $SND $RST >> lessthn10days.txt
fi
done < 0108.txt
Please advise if there is any way to make this faster and better.
apmcd47
November 2, 2017, 10:35am
2
If there are 2 million lines this might help reduce 6 million subprocesses.
#!/bin/bash
NOWT=$(expr $(date +%s%3N) / 1000)
echo -e " Time now is $NOWT"
while read FST SND RST ; do
OLDT=$((SND/1000000000))
DIFT=$(expr ${NOWT} - ${OLDT});
T=$DIFT
D=$((T/60/60/24))
H=$((T/60/60%24))
M=$((T/60%60))
S=$((T%60))
if [[ ${D} -gt 10 ]]
then
printf '%d days %02d:%02d:%02d %s,%d,%s\n' $D $H $M $S $FST $SND $RST >> olderthn10days.txt
else
printf '%d dayless %s,%d,%s\n' $D $FST $SND $RST >> lessthn10days.txt
fi
done < 0108.txt
But I cannot see that the printf
s are working correctly as $RST
will have embedded spaces in it.
If you are using bash4 you could optionally replace the line
NOWT=$(expr $(date +%s%3N) / 1000)
with
NOWT=$(expr $(printf "%(%s%3N)T" -2) / 1000)
Andrew
RudiC
November 2, 2017, 10:49am
3
Ten days to the nanosecond - please allow for a reasonable, coarser granularity. And, you should be aware that the result depends on the time of day that you run the script unless you define to use e.g. "today midnight" for the comparison...
Try
awk 'BEGIN {D10 = (srand()-864000); split ("OLDER YOUNGER", FN)} {print $0 > (FN[1+($2/1E9 > D10)])} ' file
You may want to add the desired conversions of the fields as given above.
shunya
November 2, 2017, 11:12am
4
Hi RudiC,
I tried your awk but I don't get any O/P.
panz:/tmp> awk 'BEGIN {D10 = (srand()-864000); split ("OLDER YOUNGER", FN)} {print $0 > (FN[1+($2/1E9 > D10)])} ' 0108.txt
---------- Post updated at 10:12 AM ---------- Previous update was at 10:10 AM ----------
We can compare it to Today 12:00AM if that makes it easier.
RudiC
November 2, 2017, 11:26am
5
Sure? Didn't it create the two desired output files?
shunya
November 2, 2017, 2:48pm
6
Somehow...It creates only one file YOUNGER and dumps everything in that file.
---------- Post updated at 01:48 PM ---------- Previous update was at 01:38 PM ----------
Here is sample file if you like
/ECD0303/12s3tbqe/1 1494450414791890180 4566 34359738368 34475924324 4144773 4136493 116784870 115898019
/ECD0303/10s3tbqd/1 1494450429920112522 4567 34338766848 34454780244 4138613 4130341 116576831 115660526
/ECD0303/11s3tbqe/1 1494450399980138278 4568 34359738368 34476107528 4151316 4142956 117616816 116138618
/ECD0303/13s3tbqv/1 1494450274931912967 4569 4194304 4209372 503 AA 3986663 1398647
/ECD0303/14s3tbqv/1 1494450274985871012 4570 7340032 7365436 871 CC 6839625 2791870
/ECD0303/15s3tbqv/1 1494450274997690312 4571 3145728 3157044 369 FC 3041330 1124246
/ECD0303/16s3tbr1/1 1494450283497270749 4572 458227712 459733468 53679 3D 459428840 182225614
/ECD0303/17s3tbr2/1 1494450278445573393 4573 3145728 3156848 362 2C 2983292 1113543
/ECD0303/18s3tbr4/1 1494450288802435061 4574 731906048 734338440 86736 3E 733796743 322044566
/ECD0303/1as3tbuv/1 1494450584139209737 4576 34359738368 34476011096 4147872 3528559 5261036982 1944503212
/exd/doc/3/45 1509129209233091543 72 31958192128 32064917336 3807181 A 32064917336 19422111111
/exd/doc/3/46 1509129209229341353 73 31516553216 31621961332 3760202 B 31621961332 16479692952
/exd/doc/3/47 1509129209242579870 74 30883958784 30995567000 3687020 0FA 30987341948 21703509926
/exd/doc/3/48 1509129209238274317 75 30905245696 31008440092 3681224 0 31008440092 20189779623
/exd/doc/3/49 1509129209246929594 76 28803964928 28906429400 3440802 A0B 28900446928 22970462639
/exd/doc/3/50 1509129209230745672 77 28351578112 28446458816 3384655 0 28446458816 18756473322
/exd/doc/3/51 1509129209240203414 78 786698240 791016112 94114 EDC 789340848 390977698
RudiC
November 2, 2017, 3:05pm
7
In your sample data in post#1, NO timestamp is younger than 10 days ago. For testing purposes, I modified one of them, and it worked as desired.
You may want to reverse the comparison operator to test the result...
EDIT: For your new sample, it works briliiantly: the lines with 10. May go to OLDER, the ones from 27.Oct. go to YOUNGER.
EDIT2: What's your awk
version? Please post result of
awk 'BEGIN {print srand()}'
1509649848
shunya
November 2, 2017, 3:31pm
8
panz:/tmp> awk 'BEGIN {print srand()}'
1
You have new mail in /var/spool/mail/root
panz:/tmp>
RudiC
November 2, 2017, 3:43pm
9
OK, there's the error. Try
awk -vDT=$(date +%s) 'BEGIN {D10 = (DT-864000); split ("OLDER YOUNGER", FN)} {print $0 ">" (FN[1+($2/1E9 > D10)])} ' file
1 Like
shunya
November 2, 2017, 3:54pm
10
Hi RudiC,
This is throws o/p but does not create a file as expected. This one is awesome. but would you be able to put this o/p in files please. Thank you for all your help.
RudiC
November 2, 2017, 3:56pm
11
Sorry, I forgot to remove the "test setup" - remove the double quotes around the redirection operator.
1 Like
shunya
November 3, 2017, 10:25am
12
That works like a champ, Thank you RudiC for your help!!
---------- Post updated at 09:25 AM ---------- Previous update was at 09:23 AM ----------
Also, I would like to learn more about FN and split part of your script. Please suggest if any documentation.
RudiC
November 3, 2017, 10:29am
13
FN is a variable to hold the file name array used in the "conditional" redirection... For the split
command, see man awk
and references therein.