i have a file that contains chunks similar to the above.
i want to parse this file and skip over chunks that have entry time which is older than 60 days. Meaning, DO NOT output those chunks.
Please note, some of the chunks in the data file do not have "entry_time". so for those entries, i want this script to output the chunk and move on to the next chunk.
So in other words, i want to do the date subtraction only on chunks that have "entry_time".
this is the code i'm using:
NOW=$(date +%s)
FILE=${1}
awk -v NOWTIME=$NOW -v mac="=" '
BEGIN {
RS = "{";
FS = " ";
}
FNR == 1 {
record_sep = RT;
}
{
for (i = 1; i <= NF; i++ ) {
if ( match( $i, mac ) > 0 ) {
print record_sep , $0;
break;
}
}
}
' ${FILE}
how do i substract the entry time from the time now??? and also, is this script good enough to do what i need?
IMO you are better off using perl for this problem...the "entry_time" is given in seconds since the *nix Epoch...and if your awk can figure out the current Epoch then you'd be able to subtract the "now" Epoch from the "entry_time" Epoch and dole out the desired chunks to an output file.
perl comes with the date and time routines builtin...so if I were you I'd doing this in perl otherwise feel free to ignore this post...
That'd depend on what system you are on ie your OS...because if you are using gawk then you certainly can...just look up the systime function inside the gawk man page...
remove 60days from the current time.
scan the data file
if a chunk does not contain "entry_time" and "comment_data", output it!
if a chunk actually contains "entry_time" and "comment_data", output it ONLY if the entry time is greater than $LEAST
the script below is my attempt. but its not working. can someone please help me modify it. i feel like im very close. i just need to be able to grab the value for entry_time and do the comparison:
That "hanging" is typical for awk reading from your terminal, waiting for you typing input, which it does when no file name is given. Try setting the -v and or -x options and run it again.
Using your input file from post#1 with a modified first "chunk", I get
sorry. my mistake. it is actually working. just that, the file i'm giving it is almost 200MB. so it takes a while to process it.
however, it appears its not properly subtracting the entry_time from the current time. as i still see the same data in the results returned from the file.
yes, you are right. it's not hanging. that was my mistake. the file was just so big so it took a while to get a response.
however, it appears i'm still getting the same data.
the goal of this is to trim down the 200MB file by finding the chunks that have a date older than 60 days and eliminating them so they are no longer in the file.
i updated the entry_time for a couple of the chunks to have a date of today.
so, the awk code should only be outputting those chunks that have a date newer than 60 days (which would be the two chunks whose date i updated to be recent).
also, the code should output the chunk i bolded because this chunk does not have both "comment_data" AND "entry_time".
Frankly I didn't know this fact and after your post I looked up the srand function in the awk man page...which says that it sets the seed value for rand and returns the previous seed value which would be the time of day if no argument was supplied in the first place which explains why we need to call it twice...