My data.file is about 60MB big. So i need to trim it. To trim it, i need to identify which chunks are comment chunks. and when these chunks are identified, i need to check their entry time. If their entry time is older than 60 minutes from right now, i ignore that particular chunk and move on to the next chunk. in the above data.file, there are 3 chunks in it. chunks begin with a
" {"
and ends with a
}
here is the code i'm using:
FILE=data.file
FFNUM=$(wc -l < ${FILE})
awk '{print NR","$0}' ${FILE} | egrep " {" | awk -F"," '{print $1}' | while read CLNUM
do
NTIME=${CLNUM}
LINENUMS=$(while [ $NTIME -le $FFNUM ]
do
ENDY=$(sed -n ${NTIME}p ${FILE} | egrep "^}")
if [ ! -z "${ENDY}" ] ; then
echo "${CLNUM},${NTIME}"
break
fi
NTIME=$((${NTIME} + 1))
done)
FOUND=$(sed -n ${LINENUMS}p ${FILE})
ISITCOMMENT=$(echo "${FOUND}" | egrep "comment {")
DNOW=$(date +%s)
if [ ! -z "${ISITCOMMENT}" ] ; then
ENTRYTIME=$(echo "${FOUND}" | egrep "entry_time" | awk -F"=" '{print $2}')
ELAPSEDTIME=$(awk "BEGIN{print $DNOW - $ENTRYTIME}")
if [ ${ELAPSEDTIME} -lt ${AMINUTES} ] ; then
echo "${FOUND}"
fi
else
echo "${FOUND}"
fi
done
This code works and does exactly what i need. however, it runs very slow. can anyone think of anyway i can augment this script so it runs faster?
if i updated the entry time of the third chunk with the timestamp of right now (date +%s), then, when i run this script, it should print the following:
great idea Rudi. just an FYI, some awk-s can have only single characters used for RS/ORS...
as OP didn't state his OS/awk version, the more verbose/lengthy approach was suggested....