Hello experts,
we have input files with 700K lines each (one generated for every hour). and we need to convert them as below and move them to another directory once.
Sample INPUT:-
[root@tst01 INPUT]# cat test1
1559205600000,8474,NormalizedPortInfo,PctDiscards,0.0,Interface,BG-CTA-AX1.test.com,Vl111
1559205600000,8474,NormalizedPortInfo,HistoricalInterfaceSpeed,1000000000,Interface,BG-CTA-AX1.test.com,Vl111
1559205600000,8474,NormalizedPortInfo,SpeedIn,1000000000,Interface,BG-CTA-AX1.test.com,Vl111
1559205600000,8474,NormalizedPortInfo,FrameSizeIn,209.65929490852145,Interface,BG-CTA-AX1.test.com,Vl111
1559205600000,8474,NormalizedPortInfo,PctDiscardsIn,0.0,Interface,BG-CTA-AX1.test.com,Vl111
1559205600000,8474,NormalizedPortInfo,NonunicastIn,124,Interface,BG-CTA-AX1.test.com,Vl111
[root@tst01 INPUT]#
Sample output:-
[root@tst01 INPUT]# cat ../OUTPUT/test1
TS;DURATION;SYSNM;DS_SYSNM;SYSTYPENM;OBJNM;SUBOBJNM;VALUE
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;PctDiscards;0.0;Interface;Vl111
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;HistoricalInterfaceSpeed;1000000000;Interface;Vl111
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;SpeedIn;1000000000;Interface;Vl111
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;FrameSizeIn;209.65929490852145;Interface;Vl111
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;PctDiscardsIn;0.0;Interface;Vl111
2019-05-30 09:40:00;86400;BG-CTA-AX1.test.com;NormalizedPortInfo;NonunicastIn;124;Interface;Vl111
[root@tst01 INPUT]#
I wrote a script which is working - what does is to convert epoch time to normal date in the 1st column, replace 2nd column with fixed values (86400) and remap the remaining columns as they are into different columns
The problem here is that my script processing ~ 40 lines / second resulting only 144K lines are done in an hour. we need to finish all 700K in <1 hour. CPU usage is just 12% of 1 core where it has 12-cores in single CPU. How could I improve its speed (in terms of script) and how could I let my script use all CPU cores to do parellel processing?
THANKS
My working script but processes only 40 lines per second
cat /usr/local/bin/script.sh
#!/bin/bash
BASEDIR=/tmp/tsight
INPUTDIR=${BASEDIR}/INPUT
OUTPUTDIR=${BASEDIR}/OUTPUT
DONEDIR=${BASEDIR}/DONE
mkdir -p ${BASEDIR}/INPUT
mkdir -p ${BASEDIR}/OUTPUT
mkdir -p ${BASEDIR}/DONE
cd ${INPUTDIR}
for inp in *
do
tail -n +2 ${inp} >/tmp/tempp
\mv /tmp/tempp ${inp}
echo "TS;DURATION;SYSNM;DS_SYSNM;SYSTYPENM;OBJNM;SUBOBJNM;VALUE" >${OUTPUTDIR}/${inp}
cat ${inp} | while read line
do
TIMES=`echo "${line}" |awk -F, '{print $1}'`
DURA="86400"
INTMF=`echo "${line}" |awk -F, '{print $3}'`
METRIC=`echo "${line}" |awk -F, '{print $4}'`
VALU=`echo "${line}" |awk -F, '{print $5}'`
MFDISP=`echo "${line}" |awk -F, '{print $6}'`
DEVC=`echo "${line}" |awk -F, '{print $7}'`
CNAME=`echo "${line}" |awk -F, '{print $8}'`
#TIMES=`echo "${line}" |awk -F, '{print $1}'`
NON_MIL=`expr "${TIMES}" / 1000`
EPO2DT=`date -d @${NON_MIL} '+%Y-%m-%d %H:%M:%S'`
echo "${EPO2DT};${DURA};${DEVC};${INTMF};${METRIC};${VALU};${MFDISP};${CNAME}" >>${OUTPUTDIR}/${inp}
done
\mv ${BASEDIR}/INPUT/${inp} ${DONEDIR}
done
[root@tst01 INPUT]#