Optimize shell script to run faster

SkySmart · January 6, 2015, 11:57am

data.file:

contact {
contact_name=royce-rolls
modified_attributes=0
modified_host_attributes=0
modified_service_attributes=0
host_notification_period=24x7
service_notification_period=24x7
last_host_notification=0
last_service_notification=0
host_notifications_enabled=1
service_notifications_enabled=1
}
servicecomment {
host_name=pgphhram01
service_description=Free Space All Disks
entry_type=4
comment_id=32
source=0
persistent=0
entry_time=1416610977
expires=0
expire_time=0
author=hpsm
comment_data=DM02782504
}
servicecomment {
host_name=pgphhram02
service_description=Free Space All Disks
entry_type=4
comment_id=32
source=0
persistent=0
entry_time=1420561982
expires=0
expire_time=0
author=hpsm
comment_data=DM02902504
}

My data.file is about 60MB big. So i need to trim it. To trim it, i need to identify which chunks are comment chunks. and when these chunks are identified, i need to check their entry time. If their entry time is older than 60 minutes from right now, i ignore that particular chunk and move on to the next chunk. in the above data.file, there are 3 chunks in it. chunks begin with a

"   {"

and ends with a

here is the code i'm using:

FILE=data.file

FFNUM=$(wc -l < ${FILE})

awk '{print NR","$0}' ${FILE} | egrep " {" | awk -F"," '{print $1}' | while read CLNUM
do
        NTIME=${CLNUM}

        LINENUMS=$(while [ $NTIME -le $FFNUM ]
        do
                ENDY=$(sed -n ${NTIME}p ${FILE} | egrep "^}")

                if [ ! -z "${ENDY}" ] ; then
                        echo "${CLNUM},${NTIME}"
                        break
                fi
                NTIME=$((${NTIME} + 1))
        done)

        FOUND=$(sed -n ${LINENUMS}p ${FILE})
        ISITCOMMENT=$(echo "${FOUND}"  | egrep "comment {")
        DNOW=$(date +%s)

        if [ ! -z "${ISITCOMMENT}" ] ; then

                ENTRYTIME=$(echo "${FOUND}" | egrep "entry_time" | awk -F"=" '{print $2}')

                ELAPSEDTIME=$(awk "BEGIN{print $DNOW - $ENTRYTIME}")

                if [ ${ELAPSEDTIME} -lt ${AMINUTES} ] ; then
                        echo "${FOUND}"
                fi
        else
                echo "${FOUND}"
        fi
done

This code works and does exactly what i need. however, it runs very slow. can anyone think of anyway i can augment this script so it runs faster?

vgersh99 · January 6, 2015, 12:22pm

what's the expected output given a sample input?

SkySmart · January 6, 2015, 1:24pm

if i updated the entry time of the third chunk with the timestamp of right now (date +%s), then, when i run this script, it should print the following:

contact {
contact_name=royce-rolls
modified_attributes=0
modified_host_attributes=0
modified_service_attributes=0
host_notification_period=24x7
service_notification_period=24x7
last_host_notification=0
last_service_notification=0
host_notifications_enabled=1
service_notifications_enabled=1
}
servicecomment {
host_name=pgphhram02
service_description=Free Space All Disks
entry_type=4
comment_id=32
source=0
persistent=0
entry_time=1420567480
expires=0
expire_time=0
author=hpsm
comment_data=DM02902504
}

the chunk i updated with the timestamp of right now is:

servicecomment {
host_name=pgphhram02
service_description=Free Space All Disks
entry_type=4
comment_id=32
source=0
persistent=0
entry_time=1420567480
expires=0
expire_time=0
author=hpsm
comment_data=DM02902504
}

vgersh99 · January 6, 2015, 2:00pm

try this: awk -v now="$(date +%s)" -f sky.awk myFile.txt
where sky.awk is:

BEGIN {
  FS="="
  if (!now) now=systime()

  tDiff=60*60
  p=1
}
/{/ {rec=$0;p=1;next}
/}/ && rec && p {print rec ORS $0;next}
$1=="entry_time" {
  if (now-$2>tDiff)p=0
}
{rec=rec ORS $0}

DGPickett · January 6, 2015, 2:15pm

The final while read loop should be replaced by more advanced sed. You can read a whole xxx{ ... } into the buffer using 'N' and test it.

SkySmart · January 6, 2015, 2:27pm

vgersh99:

try this: awk -v now="$(date +%s)" -f sky.awk myFile.txt
where sky.awk is:

BEGIN {
  FS="="
  if (!now) now=systime()

  tDiff=60*60
  p=1
}
/{/ {rec=$0;p=1;next}
/}/ && rec && p {print rec ORS $0;next}
$1=="entry_time" {
  if (now-$2>tDiff)p=0
}
{rec=rec ORS $0}

thanks a million!
any way to put this into one script so i can run script like this:

./script  data.file  60m

vgersh99 · January 6, 2015, 2:33pm

why not:
myScript.sh data.file 3600
myScript.sh

#!/bin/sh
#
awk -v now="$(date +%s)" -v tDiff="${2}" '
   BEGIN {   
      FS="=" 
      if (!now) now=systime()    
      if (!tDiff) tDiff=60*60
      p=1 
  } 
   /{/ {rec=$0;p=1;next} 
   /}/ && rec && p {print rec ORS $0;next} 
   $1=="entry_time" { if (now-$2>tDiff)p=0 } 
   {rec=rec ORS $0}' "${1}"

you can elaborate on the parameter passing on your own....

RudiC · January 6, 2015, 2:36pm

Try also

awk     'BEGIN          {NOW = srand()}
         $1~"comment"   {match ($0, "entry_time=[0-9]*")
                         FT = substr($0, RSTART+11, RLENGTH-11)
                         if ((NOW - FT) > 3600)  next 
                        }
         1
        ' RS="}\n" ORS="}\n" file

vgersh99 · January 6, 2015, 2:53pm

rudic:

Try also

awk     'BEGIN          {NOW = srand()}
   $1~"comment"   {match ($0, "entry_time=[0-9]*")
   FT = substr($0, RSTART+11, RLENGTH-11)
   if ((NOW - FT) > 3600)  next 
   }
   1
   ' RS="}\n" ORS="}\n" file

great idea Rudi. just an FYI, some awk-s can have only single characters used for RS/ORS...
as OP didn't state his OS/awk version, the more verbose/lengthy approach was suggested....