Optimizing bash script

SkySmart · June 14, 2016, 3:02pm

any way the following code can be optimized?

FIRSTIN=$(
        HKIPP=$(echo ${TMFR} | egrep -v "mo|MO|Mo" | egrep "m |M ")
        HRAMH=$(echo ${TMFR} | egrep "h|H")
        HRAMD=$(echo ${TMFR} | egrep "d|D")
        HRAMW=$(echo ${TMFR} | egrep "w|W")
        HKIPPO=$(echo ${TMFR} | egrep "mo|MO|Mo")
        if [ -z "${HRAMH}" ] && [ -z "${HRAMD}" ] && [ -z "${HRAMW}" ] && [ -z "${HKIPP}" ] && [ -z "${HKIPPO}" ] ; then
                echo $TMFR | sed 's~[hHmM]~~g' | gawk '{print $1 * 60}'
        elif [ ! -z "${HKIPP}" ] ; then
                echo $TMFR | sed 's~[hHmM]~~g' | gawk '{print $1 * 60}'
        elif [ ! -z "${HRAMH}" ] ; then
                echo $TMFR | sed 's~[hHmM]~~g' | gawk '{print $1 * 60 * 60}'
        elif [ ! -z "${HRAMD}" ] ; then
                echo $TMFR | sed 's~[dD]~~g' | gawk '{print $1 * 1440 * 60}'
        elif [ ! -z "${HRAMW}" ] ; then
                echo $TMFR | sed 's~[wW]~~g' | gawk '{print $1 * 10080 * 60}'
        elif [ ! -z "${HKIPPO}" ] ; then
                echo $TMFR | sed 's~[mo]~~g' | sed 's~[MO]~~g' | sed 's~[Mo]~~g' | gawk '{print $1 * 43200 * 60}'
        else
                echo $TMFR | gawk '{print $1 * 60}'
fi)

preferably in awk?

joker · June 14, 2016, 4:31pm

Hi,

can you post sample input + output files? That will
make it easier.

stomp();

Here's some bash code, which should be a lot faster, because only shell builtins are used. Can be better if you bring samples(in/out) and explain them.

#!/bin/bash

shopt -s nocasematch 

NUMBER=${TMFR//[wdhmoWDHMO]/}

# default multiplier for MINUTES
MULTIPLIER=60

# order of the checks matters!
[[ "$TMFR" =~ mo ]] && MULTIPLIER=2592000 # MONTH
[[ "$TMFR" =~ w  ]] && MULTIPLIER=604800  # WEEK
[[ "$TMFR" =~ d  ]] && MULTIPLIER=86400   # DAY
[[ "$TMFR" =~ h  ]] && MULTIPLIER=3600    # HOUR

((FIRSTIN= $NUMBER * $MULTIPLIER))

echo $FIRSTIN $NUMBER $MULTIPLIER

SkySmart · June 14, 2016, 6:08pm

joker:

Hi,

can you post sample input + output files? That will
make it easier.

stomp();

Here's some bash code, which should be a lot faster, because only shell builtins are used. Can be better if you bring samples(in/out) and explain them.
#!/bin/bash

shopt -s nocasematch 

NUMBER=${TMFR//[wdhmoWDHMO]/}

# default multiplier for MINUTES
MULTIPLIER=60

# order of the checks matters!
[[ "$TMFR" =~ mo ]] && MULTIPLIER=2592000 # MONTH
[[ "$TMFR" =~ w  ]] && MULTIPLIER=604800  # WEEK
[[ "$TMFR" =~ d  ]] && MULTIPLIER=86400   # DAY
[[ "$TMFR" =~ h  ]] && MULTIPLIER=3600    # HOUR

((FIRSTIN= $NUMBER * $MULTIPLIER))

echo $FIRSTIN $NUMBER $MULTIPLIER

thank you so much for this.
one question, this uses the newer functions of bash, i dont think it would work on the old bash that uses sh. know how to make this more portable?

RavinderSingh13 · June 15, 2016, 1:55am

Hello SkySmart,

Here comes the code re-usability by using function , as requested by Stomp you haven't shown us the Input_file so can't predict your exact requirement. Based on your shown Input_file in POST#1, could you please try following and let me know if this helps you.
You could basically change following commands(shown by you in POST#1):

echo $TMFR | sed 's~[hHmM]~~g' | gawk '{print $1 * 60}'
echo $TMFR | sed 's~[hHmM]~~g' | gawk '{print $1 * 60}'
echo $TMFR | sed 's~[hHmM]~~g' | gawk '{print $1 * 60 * 60}'
echo $TMFR | sed 's~[dD]~~g' | gawk '{print $1 * 1440 * 60}'
echo $TMFR | sed 's~[wW]~~g' | gawk '{print $1 * 10080 * 60}'
echo $TMFR | sed 's~[mo]~~g' | sed 's~[MO]~~g' | sed 's~[Mo]~~g' | gawk '{print $1 * 43200 * 60}'

To a single command:

echo $TMFR | awk 'function valuecal(Q,tmfr){gsub(/Q/,X,tmfr);val=$1 * 60;return val} {VAL=valuecal("[hHmM]",$0);print "Minutes to seconds= " OFS  VAL ORS "hours to seconds= " OFS VAL * 60;DATE_VAL=valuecal("[dD]",$0);print DATE_VAL * 1440;wW_VAL=valuecal("[wW]",$0);print wW_VAL * 10080;mo_VAL=valuecal("[mMoO]",$0;print mo_VAL * 43200}'

As I have mentioned earlier above code will do all the work but I couldn't test it as lack of Input_file.
So in case you want to take values of hHmM, dD, wW, mMoO differently then following you could try.

For hHmM:
echo $TMFR | awk 'function valuecal(tmfr){gsub(/[hHmM]/,X,tmfr);val=$1 * 60;return val} {VAL=valuecal($0);print "Minutes to seconds= " OFS  VAL ORS "hours to seconds= " OFS VAL * 60}'
For dD:
echo $TMFR | awk 'function valuecal(tmfr){gsub(/[dD]/,X,tmfr);val=$1 * 60;return val} {VAL=valuecal($0);print VAL * 1440}'
For wW:
echo $TMFR | awk 'function valuecal(tmfr){gsub(/[wW]/,X,tmfr);val=$1 * 60;return val} {VAL=valuecal($0);print VAL * 10080}'
For mMoO:
echo $TMFR | awk 'function valuecal(tmfr){gsub(/[mMoO]/,X,tmfr);val=$1 * 60;return val} {VAL=valuecal($0);print VAL * 43200}'

Please do try above codes as per your requirements and do let us know if you have any queries on same, hope this helps.

Thanks,
R. Singh

RudiC · June 15, 2016, 2:01am

Without digging deeper into the logics, wouldn't it make sense to deploy the case ... esac construct? Should be available in sh as well...

MadeInGermany · June 15, 2016, 4:41am

Yes a case is ideal here. And portable to other shells.
For extracting numbers there is some Posix shell builtins that need more assumptions, like "the digits are always at the end of the string".
For old Bourne shells, and without such assumptions, one needs an external helper, here expr.

case $TMFR in
*[Mm][Oo]*) factor=2592000;;
*[Ww]*) factor=604800;;
*[Dd]*) factor=86400;;
*[Hh]*) factor=3600;;
*) factor=60;;
esac
num=`expr x"$TMFR" : x"[^0-9]*\([0-9]*\)"`
FIRSTIN=`expr 0$num \* $factor`

Since expr bails out if the first character in $TMFR is a dash, the usual work-around is to prepend a character (here an x) that is normally repeated on the right side (here for clarity, would be absorbed by the [^0-9]).
The 0 is prepended to $num, so in case it is empty the result is 0.