awk to calculate date and show data

SkySmart · November 20, 2014, 6:08pm

data:

hostcomment {
host_name=myhost01
entry_type=1
comment_id=1
source=0
persistent=1
entry_time=1415723753
expires=0
expire_time=0
author=hpsm
comment_data=IM0837437472
}

program {
modified_host_attributes=1
modified_service_attributes=1
enable_notifications=1
active_service_checks_enabled=1
passive_service_checks_enabled=1
active_host_checks_enabled=1
passive_host_checks_enabled=1
enable_event_handlers=1
obsess_over_services=0
obsess_over_hosts=0
check_service_freshness=1
check_host_freshness=0
enable_flap_detection=1
enable_failure_prediction=1
process_performance_data=0
global_host_event_handler=
global_service_event_handler=
next_comment_id=177361
next_downtime_id=1039
next_event_id=418881
next_problem_id=194777
next_notification_id=59919
}

hostcomment {
host_name=myhost01
entry_type=1
comment_id=1
source=0
persistent=1
entry_time=1415723753
expires=0
expire_time=0
author=hpsm
comment_data=IM023434343
}

i have a file that contains chunks similar to the above.

i want to parse this file and skip over chunks that have entry time which is older than 60 days. Meaning, DO NOT output those chunks.

Please note, some of the chunks in the data file do not have "entry_time". so for those entries, i want this script to output the chunk and move on to the next chunk.

So in other words, i want to do the date subtraction only on chunks that have "entry_time".

this is the code i'm using:

NOW=$(date +%s)

FILE=${1}

awk -v NOWTIME=$NOW -v mac="=" '
    BEGIN { 
        RS = "{"; 
        FS = " "; 
    } 
    FNR == 1 {
        record_sep = RT;
    }
    { 

        for (i = 1; i <= NF; i++ ) { 
            if ( match( $i, mac ) > 0 ) {
                print record_sep , $0;
                break;
            }
        } 
    }
' ${FILE}

how do i substract the entry time from the time now??? and also, is this script good enough to do what i need?

shamrock · November 20, 2014, 7:25pm

IMO you are better off using perl for this problem...the "entry_time" is given in seconds since the *nix Epoch...and if your awk can figure out the current Epoch then you'd be able to subtract the "now" Epoch from the "entry_time" Epoch and dole out the desired chunks to an output file.

perl comes with the date and time routines builtin...so if I were you I'd doing this in perl otherwise feel free to ignore this post...

SkySmart · November 20, 2014, 7:32pm

unfortunately, i cant write in a language that i'm unfamiliar with. if this can be done in perl, please, if you can, supply the perl code.

im pretty sure theres goto be a way around this with awk.

shamrock · November 20, 2014, 7:47pm

That'd depend on what system you are on ie your OS...because if you are using gawk then you certainly can...just look up the systime function inside the gawk man page...

SkySmart · November 20, 2014, 7:48pm

its a linux red hat host.

shamrock · November 20, 2014, 7:54pm

Then you probably have gawk so "man gawk" and search for "systime"...

jlliagre · November 20, 2014, 8:12pm

Although it is not a well known fact, all awk releases are able to retrieve the current epoch time by standard :

$ awk 'BEGIN {srand();time=srand();print time}'
1416532245

SkySmart · November 20, 2014, 8:55pm

i've decided to:

remove 60days from the current time.
scan the data file
if a chunk does not contain "entry_time" and "comment_data", output it!
if a chunk actually contains "entry_time" and "comment_data", output it ONLY if the entry time is greater than $LEAST

the script below is my attempt. but its not working. can someone please help me modify it. i feel like im very close. i just need to be able to grab the value for entry_time and do the comparison:

NOW=$(date +%s)

LEAST=$(awk "BEGIN{ print $NOW - 5443200}")

FILE=${1}

awk -v TPASSED=$LEAST -v NOWTIME=$NOW -v pattern1="entry_time" -v pattern2="comment_data" '
    BEGIN {
        RS = "{";
        FS = "{";
    }
    FNR == 1 {
        record_sep = RT;
    }
    {
        for (i = 1; i <= NF; i++ ) {
            if (( match( $i, pattern1 ) > 0 ) && ( match( $i, pattern2 ) > 0 ))
                    { print record_sep , $0; break; }
            S[$record_sep]

#  for(X in S) delete S[X];
#
#  for(N=2; N<=NF; N++)
#  {
#       gsub(/^[ \t]+/, "", $N);
#       split($N, A, "=");
#       D[A[1]] = A[2]
#       i = 3;
#       while (i in A)
#          D[A[1]] = D[A[1]] "=" A[i++];
#  }

            else
                    { print record_sep , $0; break; }
        }
    }
' ${FILE}

jlliagre · November 21, 2014, 5:03am

Try this:

awk -v pattern1="entry_time" -v pattern2="comment_data" '
    BEGIN {
        srand(); now=srand();
        least=now-5443200;
        RS=""
        FS="="
        ORS="\n\n"
    }
    {
      p=t=0
      for(i=1;i<=NF;i++)  {
        j=i+1
        if($i==pattern2) p=1
        if($i==pattern1) t=$j
      }
      if((t==0 && p==0) || (t>least && p==1))
        print
    }
' $1

RudiC · November 21, 2014, 5:38am

Not sure if this is too simplistic:

awk     'BEGIN                          {NOW=srand()}
         /comment_data/ &&
         match ($0,/entry_time=[0-9]*/) {if (NOW-substr ($0, RSTART+11,  RLENGTH-11) > 5184000) next}
         1
        ' RS= ORS="\n\n" file

BTW - 5443200 represents 63 days.

SkySmart · November 21, 2014, 9:34am

rudic:

Not sure if this is too simplistic:

awk     'BEGIN                          {NOW=srand()}
   /comment_data/ &&
   match ($0,/entry_time=[0-9]*/) {if (NOW-substr ($0, RSTART+11,  RLENGTH-11) > 5184000) next}
   1
   ' RS= ORS="\n\n" file

BTW - 5443200 represents 63 days.

this is hanging when i run it. i get no output at all. it just hangs.

---------- Post updated at 10:34 AM ---------- Previous update was at 10:32 AM ----------

jlliagre:

Try this:

awk -v pattern1="entry_time" -v pattern2="comment_data" '
   BEGIN {
   srand(); now=srand();
   least=now-5443200;
   RS=""
   FS="="
   ORS="\n\n"
   }
   {
   p=t=0
   for(i=1;i<=NF;i++)  {
   j=i+1
   if($i==pattern2) p=1
   if($i==pattern1) t=$j
   }
   if((t==0 && p==0) || (t>least && p==1))
   print
   }
' $1

same thing here. it seems to just hang. looks like we're getting closer

RudiC · November 21, 2014, 9:40am

Did you supply YOUR file name?

SkySmart · November 21, 2014, 9:47am

Yes I did. i replaced the "$1" with my file.

and for the code from jilla, i replaced 'file' with my datafile.

RudiC · November 21, 2014, 11:11am

That "hanging" is typical for awk reading from your terminal, waiting for you typing input, which it does when no file name is given. Try setting the -v and or -x options and run it again.

Using your input file from post#1 with a modified first "chunk", I get

awk     'BEGIN                          {LIMDAT=srand() - 5184000}
         /comment_data/ &&
         match ($0,/entry_time=[0-9]*/) {if (substr ($0, RSTART+11,  RLENGTH-11) < LIMDAT) next}
         1
        ' RS= ORS="\n\n" file
program {
modified_host_attributes=1
modified_service_attributes=1
enable_notifications=1
active_service_checks_enabled=1
passive_service_checks_enabled=1
active_host_checks_enabled=1
passive_host_checks_enabled=1
enable_event_handlers=1
obsess_over_services=0
obsess_over_hosts=0
check_service_freshness=1
check_host_freshness=0
enable_flap_detection=1
enable_failure_prediction=1
process_performance_data=0
global_host_event_handler=
global_service_event_handler=
next_comment_id=177361
next_downtime_id=1039
next_event_id=418881
next_problem_id=194777
next_notification_id=59919
}

hostcomment {
host_name=myhost01
entry_type=1
comment_id=1
source=0
persistent=1
entry_time=1415723753
expires=0
expire_time=0
author=hpsm
comment_data=IM023434343
}

SkySmart · November 21, 2014, 11:49am

sorry. my mistake. it is actually working. just that, the file i'm giving it is almost 200MB. so it takes a while to process it.

however, it appears its not properly subtracting the entry_time from the current time. as i still see the same data in the results returned from the file.

jlliagre · November 21, 2014, 11:51am

My script cannot hang unless you didn't supply a filename. Please double check.

SkySmart · November 21, 2014, 12:06pm

yes, you are right. it's not hanging. that was my mistake. the file was just so big so it took a while to get a response.

however, it appears i'm still getting the same data.

the goal of this is to trim down the 200MB file by finding the chunks that have a date older than 60 days and eliminating them so they are no longer in the file.

I apologize if i wasn't clear before.

RudiC · November 21, 2014, 1:01pm

Why don't you try it with a few "chunks" of which some are older and some newer than those 60 days?

SkySmart · November 21, 2014, 1:22pm

These are the chunks in my test file:

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1408251613
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1416593731
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1408251613
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1416593664
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entiiiry_time=1408251613
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}
root@mojomo-VirtualBox:~# 
root@mojomo-VirtualBox:~# 
root@mojomo-VirtualBox:~# cat data.u 
servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1408251613
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1416593731
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1408251613
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entry_time=1416593664
expires=0
expire_time=0
author=hpsm
comment_data=IM02015654
}

servicecomment {
host_name=myhost01
service_description=Load Average
entry_type=4
comment_id=70711
source=0
persistent=1
entiiiry_time=1408251613
expires=0
expire_time=0
author=hyat
comment_data=IM02015654
}

i updated the entry_time for a couple of the chunks to have a date of today.

so, the awk code should only be outputting those chunks that have a date newer than 60 days (which would be the two chunks whose date i updated to be recent).

also, the code should output the chunk i bolded because this chunk does not have both "comment_data" AND "entry_time".

any ideas?

shamrock · November 21, 2014, 2:24pm

Thank you so much for this tip

Frankly I didn't know this fact and after your post I looked up the srand function in the awk man page...which says that it sets the seed value for rand and returns the previous seed value which would be the time of day if no argument was supplied in the first place which explains why we need to call it twice...