Help me to improve script to restart services

Good morning, I need your help please:

In a live Production System Online Charge System there is a weekly task to restart Services, its a manual Process, I want to do this by this script, but I cant prove in production bcz it affects Outgoing and incoming calls by executing the shell restart_remote.sh, so I dont want to get fired jeje, this is:

#!/usr/bin/ksh

RUTA = /export/COLOMBIA/UTIL/Tools/1_bin
T_DELAY=3

cd $RUTA

while read line
do
  ./restart_remote.sh $LINE $T_DELAY
done < lista_componentes.txt >> Restart_Services_YYYYMMDD.log

This is the file to be read or processed:

lista_componentes.txt

--Voice
TARIF_RO
DIAM_RO
DIAMETER_RO
DIAM_RO_PAS
POSTTARIF_RO
--Data
DIAMETAR
DIAMETER_DATOS
DIAM_DATOS
DIAM_DATOS_PAS
POSTDIAMETAR
--SMS
SERTAMSJT
DIAMETER_SMS
DIAM_SMS
DIAM_SMS_PAS

1 Its a way to simulate without executing before in Production?
2 The file have differrent services categorized by voice, data and sms but i dont know at the moment od restarting the services print out sth like restarting voice services:
3 is there any way to improve the above script?

This is the original restart_remote.sh script i reused but not created by me:

#!/bin/ksh                                                                                   
##################################################################################################
#                                                                                                #
# Restarts all <COMPONENT> processes (instances) in all servers, leaving a <DELAY> between       #
# restarts                                                                                       #
#                                                                                                #       
# Syntax: restart_remote.sh <COMPONENT> <DELAY> [<SERVER1> ... <SERVERN>]                        #       
#                                                                                                #
# Author: OCS RE support team                                                                    #
#                                                                                                #
# Last revision: May 2019                                                                        #
#                                                                                                #
##################################################################################################


#### IMPORTED LIBRARIES
. ../2_config/config_particular

. ../3_lib/lib_bbdd.sh
. ../3_lib/lib_aplica.sh
. ../3_lib/lib_goenv.sh 


#### PARAMETERS
cmd="$0 $@"

if [[ $# -eq 2  ]]; then            
        PROCESO=$1                  
        DELAY=$2

        # SERVERS DATA RETRIEVAL
        for tipomaq in $tipomaquinas; do
                maquinas=${maquinas}" "`obten_maquinas $tipomaq $iniciomaq`
        done
        echo $maquinas | grep "ORA" > /dev/null && echo "ERROR: Servers names could not be retrieved from the OCS_RE database through the $SID_EG_EXTRAER SID; Please check the obten_maquinas function within the 3_lib folder" && exit

elif [[ $# -ge 3  ]]; then                                                 
        PROCESO=$1                                                       
        DELAY=$2
        shift 2
        maquinas=$*
else
        echo "Incorrect number of parameters: $0 <COMPONENT> <DELAY> [<SERVER1> ... <SERVERN>]"; echo
        exit
fi


#### BEGIN
# Checking we are not in the middle of a maintenance window.
comprueba_si_es_ventana || exit; echo

auxi2="N"
for maquina in $maquinas ; do
        auxi2=`Proceso_corriendo $maquina POST${PROCESO}`
        if [[ $auxi2 == "S" ]]; then
                break
        fi
done

if [[ $auxi2 == "S" ]]; then
        echo -n "Do you want to restart also POST${PROCESO} (y/n)? "; read tmp; echo
        if [[ $tmp == "y" ]]; then
                auxi2="y"
        fi
fi

msg_on_log "$cmd" "START"
for maquina in $maquinas ; do
        auxi=`Proceso_corriendo $maquina $PROCESO`
        if [[ $auxi == "S" ]]; then
                msg_on_screen_and_log 36 "Restarting $PROCESO on $maquina...";
                pids=`saca_pids $maquina $PROCESO`
                para9_remoto $maquina $DELAY $pids; echo
                if [[ $auxi2 == "y" ]]; then
                        msg_on_screen_and_log 36 "Restarting POST${PROCESO} on $maquina..."
                        pids=`saca_pids $maquina POST${PROCESO}`
                        para9_remoto $maquina $DELAY $pids; echo
                fi
        fi
done
echo; msg_on_log "$cmd" "FINISH\n"

I appreciate your help in advanced

Don't you have a second environment for dev and testing?

When it comes to e.g system patching, security patching etc... don't tell me you do that on production without having tested to see if there were no side effects...

2 Likes

additionally to q's posed by @vbe ...

Run scripts through the shellcheck utility - ideally install it , it's also available online shellcheck address any of the warning/errors it throws.

here's output produced from your first script ....

shellcheck -s ksh restart.sh 

In restart.sh line 3:
RUTA = /export/COLOMBIA/UTIL/Tools/1_bin
     ^-- SC1068: Don't put spaces around the = in assignments (or quote to make it literal).


In restart.sh line 6:
cd $RUTA
^------^ SC2164: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

Did you mean: 
cd $RUTA || exit


In restart.sh line 8:
while read line
      ^--^ SC2162: read without -r will mangle backslashes.
           ^--^ SC2034: line appears unused. Verify use (or export if used externally).


In restart.sh line 10:
  ./restart_remote.sh $LINE $T_DELAY
                      ^---^ SC2153: Possible misspelling: LINE may not be assigned, but line is.
                      ^---^ SC2086: Double quote to prevent globbing and word splitting.

Did you mean: 
  ./restart_remote.sh "$LINE" $T_DELAY

For more information:
  https://www.shellcheck.net/wiki/SC1068 -- Don't put spaces around the = in ...
  https://www.shellcheck.net/wiki/SC2034 -- line appears unused. Verify use (...
  https://www.shellcheck.net/wiki/SC2153 -- Possible misspelling: LINE may no...
2 Likes

Some possible options

  • +1 for having a testing environment as close as possible to reflect the production system
  • check every command call and maybe abort if anything fails.
  • Check every variable which is generated if the value is plausible in terms of expected values.
  • You may identify critical sections, where you can abort the process before crossing the point of no return if the least fishy happens.
  • You may test parts of your script with realistic test data and validate the resulting output thoroughly
  • If possible, disable and block inbound data transfer, so the requests get rather rejected than lost, if that is desirable or at least acceptable. Maybe you can disable new connections and wait until current connections are completed, finished. You may need to discover ways figuring out how to get that information.
  • You may also develop your own automated code testing, which checks your code regularly and at least before application of any change to production.

Thanks you very much all of you for your time and support

Maybe the company has a test env but I'm not allowed to have access bcz I'm not of the Sys Admin team, that's why I ask you if there is a way to simulate without executing

Furthermore I'm pleased to be advised if there is any improved version for the first script I posted, for instance restart first voice services, second data services and last sms services

Thanks a lot in advanced

Shellcheck suggestions implemented, echo lines added, action de-activated:

#!/bin/ksh

YYYYMMDD=$(date +'%Y%m%d')
RUTA=/export/COLOMBIA/UTIL/Tools/1_bin
T_DELAY=3

cd "$RUTA" || exit

while read -r LINE
do
  if [[ $LINE == [A-Za-z]* ]]
  then
    echo ./restart_remote.sh "$LINE" $T_DELAY
#   ./restart_remote.sh "$LINE" $T_DELAY
  else
     echo "$LINE..."
     sleep 5
  fi
done < lista_componentes.txt >> Restart_Services_"$YYYYMMDD".log
echo "Done."

The echo output goes to the .log file.
The action is de-activated by a #, so you can run this without any risk.
If it looks promising then remove the comment sign before the action i.e. the # before the ./restart_remote.sh

1 Like

Thanks you very much MadeInGermany for your time and support, I will take into account your suggestions, I will probably run this script at night hours

Gooe evening, Im testing this script, I would like to know this command what is does with double brackets?

if [[ $LINE == [A-Za-z]* ]]

I suppose every LINE discard capital/lower letters?

lejandro@alejandro-VirtualBox:~/ejemplos$ ./reinicios.sh

++ date +%Y%m%d

+ YYYYMMDD=20231202

+ RUTA=/home/alejandro/export/COLOMBIA/UTIL/Tools/1_bin

+ T_DELAY=3

+ cd /home/alejandro/export/COLOMBIA/UTIL/Tools/1_bin

+ read -r LINE

+ [[ TARIF_RO == [A-Za-z]* ]]

+ echo ./restart_remote.sh TARIF_RO 3

+ read -r LINE

+ [[ DIAM_RO == [A-Za-z]* ]]

+ echo ./restart_remote.sh DIAM_RO 3

+ read -r LINE

+ [[ DIAMETER_RO == [A-Za-z]* ]]

+ echo ./restart_remote.sh DIAMETER_RO 3

+ read -r LINE

+ [[ DIAM_RO_PAS == [A-Za-z]* ]]

+ echo ./restart_remote.sh DIAM_RO_PAS 3

+ read -r LINE

+ [[ POSTTARIF_RO == [A-Za-z]* ]]

+ echo ./restart_remote.sh POSTTARIF_RO 3

+ read -r LINE

+ echo Done.

Done.


I appreciate your help in advanced

This is a glob match. [A-Za-z] is a character from the given ranges: a letter. The * is a wildcard "any characters".
The [[ ]] becomes true if $LINE is a letter followed by anything.
If $LINE begins with a letter then the following code block is run. The block ends with fi or else or elif

Thanks you very much MadeInGermany for your time an support

Good evening, after a couple of months I Run the script on Production, but it only read the first line only and restarted the services for the DIAMETAR Service only, the 4 remain Services did nothing:

med2egsdp1:/OT/ReiniProcesos_SDPS > ./ReiniProcesos_sdps.sh
date
bc
med2egsdp1:/OT/ReiniProcesos_SDPS > date
jue mar  7 00:35:28 COT 2024

The script: ReiniProcesos_sdps.sh

!/bin/ksh
YYYYMMDD=$(date +'%Y%m%d')
RUTA=/export/COLOMBIA/UTIL/Tools/1_bin
T_DELAY=3

cd "$RUTA" || exit

while read -r LINE
do
  if [[ $LINE == [A-Za-z]* ]]
  then
    ./restart_remote.sh "$LINE" $T_DELAY
  else
     echo "$LINE..."
     sleep 5
  fi
done < lista_componentes.txt >> Restart_Services_"$YYYYMMDD".log

The Input file: lista_componentes.txt

DIAMETAR
DIAMETER_DATOS
DIAM_DATOS
DIAM_DATOS_PAS

the only odd thing I found is when as soon i run the command it asked me :
Do you want to restart also POSTDIAMETAR (y/n)? n
Obvisuly i said no bcz POSTDIAMETAR Service is on the last line to be read from the file

Log:

Do you want to restart also POSTDIAMETAR (y/n)? n

Restarting DIAMETAR on cel8besdp1...
DIAMETAR_C2_I27 restarted
DIAMETAR_C2_I26 restarted
DIAMETAR_C2_I25 restarted
DIAMETAR_C2_I24 restarted
DIAMETAR_C2_I23 restarted
DIAMETAR_C2_I22 restarted
DIAMETAR_C2_I21 restarted

Restarting DIAMETAR on cel8besdp2...
DIAMETAR_C1_I27 restarted
DIAMETAR_C1_I26 restarted
DIAMETAR_C1_I25 restarted
DIAMETAR_C1_I24 restarted
DIAMETAR_C1_I23 restarted
DIAMETAR_C1_I22 restarted

Restarting DIAMETAR on cel8besdp3...
DIAMETAR_C2_I27 restarted
DIAMETAR_C2_I26 restarted
DIAMETAR_C1_I27 restarted
DIAMETAR_C2_I25 restarted
DIAMETAR_C1_I26 restarted
DIAMETAR_C2_I24 restarted
DIAMETAR_C1_I25 restarted

Restarting DIAMETAR on cel8besdp4...
DIAMETAR_C1_I27 restarted
DIAMETAR_C1_I26 restarted
DIAMETAR_C1_I25 restarted
DIAMETAR_C1_I24 restarted
DIAMETAR_C2_I27 restarted
DIAMETAR_C1_I23 restarted
DIAMETAR_C2_I26 restarted
DIAMETAR_C1_I22 restarted

I appreciate your help in advanced

"Everyone has a test environment, some of us are luck enough to have a production environment too ;)"

@alexcol , modify your script to capture stderr

...done < lista_componentes.txt >> Restart_Services_"$YYYYMMDD".log 2>&1

This declaration doesn't map to what you've shown ... so, something is missing - maybe error messages that are not being captured ...., add more logging

the only odd thing I found is when as soon i run the command it asked me :
Do you want to restart also POSTDIAMETAR (y/n)? n
Obvisuly i said no bcz POSTDIAMETAR Service is on the last line to be read from the file*

DIAMETAR is the FIRST line in the lista_componentes.txt file according to your post...

Show what you EXPECT the outputs to be (we can surmise but you 'know' ! )

**put timestamps on any/all messages being logged **

given the following test scenario below:.

cat lista_componentes.txt
DIAMETAR
DIAMETER_DATOS
DIAM_DATOS
DIAM_DATOS_PAS
2THiS-should-be-skipped

cat alexcol.ksh
#!/bin/ksh

T_DELAY=3
YYYYMMDD=$(date +'%Y%m%d')

while read -r LINE
do
  if [[ "$LINE" == [A-Za-z]* ]]
  then
          echo "$(date) :   ./restart_remote.sh $LINE $T_DELAY"
  else
          echo "$(date): skipping $LINE..." >&2
     sleep 5
  fi
done < lista_componentes.txt >> Restart_Services_"$YYYYMMDD".log 2>&1

./alexcol.ksh

cat Restart_Services_20240308.log
Fri  8 Mar 09:06:42 GMT 2024 :   ./restart_remote.sh DIAMETAR 3
Fri  8 Mar 09:06:42 GMT 2024 :   ./restart_remote.sh DIAMETER_DATOS 3
Fri  8 Mar 09:06:42 GMT 2024 :   ./restart_remote.sh DIAM_DATOS 3
Fri  8 Mar 09:06:42 GMT 2024 :   ./restart_remote.sh DIAM_DATOS_PAS 3
Fri  8 Mar 09:06:42 GMT 2024: skipping 2THiS-should-be-skipped...

We don't know what the ./restart_remote.sh does, but it seems to become interactive.
In this case, and if interactiveness involves stdin, we should not use stdin in the while read loop.
The following modifications will use file descriptor 3 rather than file descriptor 1 (stdin, the default for reading input):

while read -r LINE <&3
do
...
done 3< lista_componentes.txt >> Restart_Services_"$YYYYMMDD".log 2>&1

2 Likes

Good evening, Thanks you very much all of you for your time and support

The next week probably Ive got the opportunity to run the script again in Production and make the modifications to the scripts sugested

Its very important to get this done bcz for every service listed it starts processes in 10 different machines, thats what restart_remote.sh script does, and when we are asked to restart a whole 20 Services it takes all night long to do it manually and it has a tremendous impact in an OCS (Online Charge System), so the script ReiniProcesos_sdps.sh would make a huge difference in time and effectiveness

I would let you know the outcome, Thanks you all of you once again for your time