awk to calculate date and show data

data:

hostcomment {
host_name=myhost01
entry_type=1
comment_id=1
source=0
persistent=1
entry_time=1415723753
expires=0
expire_time=0
author=hpsm
comment_data=IM0837437472
}

program {
modified_host_attributes=1
modified_service_attributes=1
enable_notifications=1
active_service_checks_enabled=1
passive_service_checks_enabled=1
active_host_checks_enabled=1
passive_host_checks_enabled=1
enable_event_handlers=1
obsess_over_services=0
obsess_over_hosts=0
check_service_freshness=1
check_host_freshness=0
enable_flap_detection=1
enable_failure_prediction=1
process_performance_data=0
global_host_event_handler=
global_service_event_handler=
next_comment_id=177361
next_downtime_id=1039
next_event_id=418881
next_problem_id=194777
next_notification_id=59919
}

hostcomment {
host_name=myhost01
entry_type=1
comment_id=1
source=0
persistent=1
entry_time=1415723753
expires=0
expire_time=0
author=hpsm
comment_data=IM023434343
}

i have a file that contains chunks similar to the above.

i want to parse this file and skip over chunks that have entry time which is older than 60 days. Meaning, DO NOT output those chunks.

Please note, some of the chunks in the data file do not have "entry_time". so for those entries, i want this script to output the chunk and move on to the next chunk.

So in other words, i want to do the date subtraction only on chunks that have "entry_time".

this is the code i'm using:

NOW=$(date +%s)

FILE=${1}

awk -v NOWTIME=$NOW -v mac="=" '
    BEGIN { 
        RS = "{"; 
        FS = " "; 
    } 
    FNR == 1 {
        record_sep = RT;
    }
    { 

        for (i = 1; i <= NF; i++ ) { 
            if ( match( $i, mac ) > 0 ) {
                print record_sep , $0;
                break;
            }
        } 
    }
' ${FILE}

how do i substract the entry time from the time now??? and also, is this script good enough to do what i need?

IMO you are better off using perl for this problem...the "entry_time" is given in seconds since the *nix Epoch...and if your awk can figure out the current Epoch then you'd be able to subtract the "now" Epoch from the "entry_time" Epoch and dole out the desired chunks to an output file.

perl comes with the date and time routines builtin...so if I were you I'd doing this in perl otherwise feel free to ignore this post...

unfortunately, i cant write in a language that i'm unfamiliar with. if this can be done in perl, please, if you can, supply the perl code.

im pretty sure theres goto be a way around this with awk.

That'd depend on what system you are on ie your OS...because if you are using gawk then you certainly can...just look up the systime function inside the gawk man page...

its a linux red hat host.

Then you probably have gawk so "man gawk" and search for "systime"...

Although it is not a well known fact, all awk releases are able to retrieve the current epoch time by standard :

$ awk 'BEGIN {srand();time=srand();print time}'
1416532245
2 Likes

i've decided to:

remove 60days from the current time.
scan the data file
if a chunk does not contain "entry_time" and "comment_data", output it!
if a chunk actually contains "entry_time" and "comment_data", output it ONLY if the entry time is greater than $LEAST

the script below is my attempt. but its not working. can someone please help me modify it. i feel like im very close. i just need to be able to grab the value for entry_time and do the comparison:

NOW=$(date +%s)

LEAST=$(awk "BEGIN{ print $NOW - 5443200}")

FILE=${1}

awk -v TPASSED=$LEAST -v NOWTIME=$NOW -v pattern1="entry_time" -v pattern2="comment_data" '
    BEGIN {
        RS = "{";
        FS = "{";
    }
    FNR == 1 {
        record_sep = RT;
    }
    {
        for (i = 1; i <= NF; i++ ) {
            if (( match( $i, pattern1 ) > 0 ) && ( match( $i, pattern2 ) > 0 ))
                    { print record_sep , $0; break; }
            S[$record_sep]

#  for(X in S) delete S[X];
#
#  for(N=2; N<=NF; N++)
#  {
#       gsub(/^[ \t]+/, "", $N);
#       split($N, A, "=");
#       D[A[1]] = A[2]
#       i = 3;
#       while (i in A)
#          D[A[1]] = D[A[1]] "=" A[i++];
#  }

            else
                    { print record_sep , $0; break; }
        }
    }
' ${FILE}

Try this:

awk -v pattern1="entry_time" -v pattern2="comment_data" '
    BEGIN {
        srand(); now=srand();
        least=now-5443200;
        RS=""
        FS="="
        ORS="\n\n"
    }
    {
      p=t=0
      for(i=1;i<=NF;i++)  {
        j=i+1
        if($i==pattern2) p=1
        if($i==pattern1) t=$j
      }
      if((t==0 && p==0) || (t>least && p==1))
        print
    }
' $1
1 Like

Not sure if this is too simplistic:

awk     'BEGIN                          {NOW=srand()}
         /comment_data/ &&
         match ($0,/entry_time=[0-9]*/) {if (NOW-substr ($0, RSTART+11,  RLENGTH-11) > 5184000) next}
         1
        ' RS= ORS="\n\n" file

BTW - 5443200 represents 63 days.

1 Like

this is hanging when i run it. i get no output at all. it just hangs.

---------- Post updated at 10:34 AM ---------- Previous update was at 10:32 AM ----------

same thing here. it seems to just hang. looks like we're getting closer :slight_smile:

Did you supply YOUR file name?

Yes I did. i replaced the "$1" with my file.

and for the code from jilla, i replaced 'file' with my datafile.

That "hanging" is typical for awk reading from your terminal, waiting for you typing input, which it does when no file name is given. Try setting the -v and or -x options and run it again.

Using your input file from post#1 with a modified first "chunk", I get

awk     'BEGIN                          {LIMDAT=srand() - 5184000}
         /comment_data/ &&
         match ($0,/entry_time=[0-9]*/) {if (substr ($0, RSTART+11,  RLENGTH-11) < LIMDAT) next}
         1
        ' RS= ORS="\n\n" file
program {
modified_host_attributes=1
modified_service_attributes=1
enable_notifications=1
active_service_checks_enabled=1
passive_service_checks_enabled=1
active_host_checks_enabled=1
passive_host_checks_enabled=1
enable_event_handlers=1
obsess_over_services=0
obsess_over_hosts=0
check_service_freshness=1
check_host_freshness=0
enable_flap_detection=1
enable_failure_prediction=1
process_performance_data=0
global_host_event_handler=
global_service_event_handler=
next_comment_id=177361
next_downtime_id=1039
next_event_id=418881
next_problem_id=194777
next_notification_id=59919
}

hostcomment {
host_name=myhost01
entry_type=1
comment_id=1
source=0
persistent=1
entry_time=1415723753
expires=0
expire_time=0
author=hpsm
comment_data=IM023434343
}
1 Like

sorry. my mistake. it is actually working. just that, the file i'm giving it is almost 200MB. so it takes a while to process it.

however, it appears its not properly subtracting the entry_time from the current time. as i still see the same data in the results returned from the file.

My script cannot hang unless you didn't supply a filename. Please double check.

1 Like

yes, you are right. it's not hanging. that was my mistake. the file was just so big so it took a while to get a response.

however, it appears i'm still getting the same data.

the goal of this is to trim down the 200MB file by finding the chunks that have a date older than 60 days and eliminating them so they are no longer in the file.

I apologize if i wasn't clear before.

Why don't you try it with a few "chunks" of which some are older and some newer than those 60 days?

1 Like

These are the chunks in my test file:

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1408251613
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1416593731
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1408251613
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1416593664
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entiiiry_time=1408251613
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}
root@mojomo-VirtualBox:~# 
root@mojomo-VirtualBox:~# 
root@mojomo-VirtualBox:~# cat data.u 
servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1408251613
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1416593731
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1408251613
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1416593664
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entiiiry_time=1408251613
expires=0
expire_time=0
author=hyat
comment_data=IM02015654
}

i updated the entry_time for a couple of the chunks to have a date of today.

so, the awk code should only be outputting those chunks that have a date newer than 60 days (which would be the two chunks whose date i updated to be recent).

also, the code should output the chunk i bolded because this chunk does not have both "comment_data" AND "entry_time".

any ideas?

Thank you so much for this tip :b:

Frankly I didn't know this fact and after your post I looked up the srand function in the awk man page...which says that it sets the seed value for rand and returns the previous seed value which would be the time of day if no argument was supplied in the first place which explains why we need to call it twice...