Resume from last failed command

Dear Experts,

I am creating a shell script (A) which is menu driven and in turn calls another shell scripts (X) depending on the selection. X has a list of commands which runs batch jobs in auto mode.
Content of X

./sqr.ksh axhri051_sf axhri051_sf.par $1 $2 XHRPPYOV;
./sqr.ksh axbni062 axbni062.par $1 $2 XHRPBYOW;
./sqr.ksh axbni061 axbni061.par $1 $2 XHRPBYOX;
./sqr.ksh axpyi035_sf axpyi035_sf.par $1 $2 XHRPPYA4;

I want to be able to resume directly from A if say a job XHRPBYOW has failed.
The program should "prompt" after it has failed if I want to resume from the last failed job. Upon entering my choice it should resume from like (2 in example) and continue.
Is it possible to do so ? If yes how ?

Thank you for your time.
Your response is much appreciated.

What have you tried so far?

What do these processes do? Do they create files you can use to tell where to continue?

@Corona688
Thank you for the response.
Yes each command like the one below
./sqr.ksh axpyi035_sf axpyi035_sf.par $1 $2 XHRPPYA4;
creates a log file. So if a command aborts, it gets written to that
log file. How can I use that ?
the log file is generally called
XHRPPYA4_datetime.log

@sea
I tried
fg but i assume it is not the right command to be used here.

Do you think I should read the last log file and resume from there ?
Any suggestions how to do that ?

If ./sqr.ksh delivers a correct error code at exit, then you can make X stop at errors if you put

set -e

What about output files? Is there a file it would create only if it succeeded? I was thinking of using a makefile.

Thank you for the response.
I was reading about makefile and it seems to me it is about compiling a program.
Probably I wasn't clear with my requirements.
I have a master file X.sh which calls A.sh, B.sh or C.sh depending on menu selection.
Now A.sh has commands separated by semicolon.
for example
do 1 ;
do 2;
do 3;
do 4;

Now in-case "do 2" fails (say due to data issue), the program should prompt me if I wish to resume.Depending on my entry. The program should resume from "do 2" and continue and then go back to the main calling program menu X.sh.
Hope I am clear now.
Is it possible for me to know which line the program A.sh got stuck or failed so I may resume from there only ?

Log files are created irrespective of the program's run status. However, these log files says "Its successful" or "Aborted" if it has failed.
I can read that , but how do I tell the program to resume from the last failed command ?

Hi,

This is actually quite a straight forward process, I have used this several times for long shell scripts. Normally I would break the process into a number of component parts, then allocate each to a sub section of a shell. On completion of each section or sub section I write a file in "/tmp" with a numeric value in it, the shell script accepts a number of arguments. With one being to start where the shell finished last time, this is identified by reading the contents of the file in "/tmp".

Regards

Dave

You can use makefiles for anything. It is a list of what files are used to create what files, and what programs are used to create them. It can start in the middle, by seeing what files have already been created.

I don't suppose, on the third try, you could actually answer my question?

Is there a file which is created on success but not created on failure? Yes, or no?

Here is an example, written on the assumption your programs create no files.

# Named Makefile, in the current directory
file4:file3
        ./sqr.ksh axpyi035_sf axpyi035_sf.par $1 $2 XHRPPYA4;
        touch file4
file3:file2
        ./sqr.ksh axbni061 axbni061.par $1 $2 XHRPBYOX;
        touch file3
file2:file1
        ./sqr.ksh axbni062 axbni062.par $1 $2 XHRPBYOW;
        touch file2
file1:
        # Code to run program1 here
        # Leave the 'touch', Make needs it to track
        ./sqr.ksh axhri051_sf axhri051_sf.par $1 $2 XHRPPYOV;
        touch file1

clean:
        rm -f file1 file2 file3 file4

When you run make, it will attempt to create file1 through file5 in that order, by running the commands you tell it to. If there is an error, it will detect the nonzero return code and stop in the middle. Next time you run it, it will remember where it left off by what file1...file5 files have and haven't been created.

Run 'make clean' to remove the files and start over from the beginning.

NOte that the eight leading spaces are actually tabs and must be tabs for make to work.

Hi,

Here is the start screen from a script that I run to configure Solaris Zones, is this the type of thing that you are looking for?

# -------------------------------------------------------------------------------------------------------
# This programme checks the installation, configuration and the observance of the standards 
# and Security Policy for Solaris 10 systems.
# (c) NOOBAB.COM : xxxxxxxxxxxxx.xxxxxxx.xx.xx :  Mon, Sep 29, 2014 16:39:32
# -------------------------------------------------------------------------------------------------------
#
 please wait 
# -------------------------------------------------------------------------------------------------------
# At first the program checks the last Step in the last run. 
# If the programm was crashed or you have stopped the script with <Crtl-C> 
# You have the chance to start the programm from the last step.
# -------------------------------------------------------------------------------------------------------

   - Would you like to start this script from ... 
       - the last Section where the script is broken ?      Type [ 'L' ]
       - the Beginning of this Script ?                     Type [ 'B' ]
       - from a specific Section No ?                       Type [ 'S' ]

# -------------------------------------------------------------------------------------------------------
   Your input please:                                               S



   SECTION:  1 - Check NFS Configuration
   SECTION:  2 - Check '/etc/hosts' if global informations included
   SECTION:  3 - Check '/etc/profile' and '/etc/nsswitch.conf'
   SECTION:  4 - Check zfs-filesystems if exists 
   SECTION:  5 - Check if 'UX-User' included in '/etc/passwd' and '/etc/group'
   SECTION:  6 - Check whether directorys are existing
   SECTION:  7 - Check and Run 'rsync' to copy default configuration on taget system
   SECTION:  8 - Check group permission of file '/var/adm/messages' must be: 'bmcadm'
   SECTION:  9 - Check 'Quest-SSH' configuration
   SECTION: 10 - Check 'VAS Tool' configuration
   SECTION: 11 - Check join local server '' to the active directory
   SECTION: 12 - Check if the services 'vasd', vasgpd' and 'quest-ssh' are started
   SECTION: 13 - Check Hardening netservices
   SECTION: 14 - Check NTP Configuration and SMF-Service
   SECTION: 15 - Check File 'dsm.sys' for TSM-Client
   SECTION: 16 - Check Configuration defaults
   SECTION: 17 - Check default security configuration
   SECTION: 18 - Check 'rsa' and 'dsa' keys for ssh
   SECTION: 19 - Check 'EMC PowerPath Registration' (only on global zones)
   SECTION: 20 - Check 'SYMCLI EMC Solutions Enabler' installation (only on global zones)'
   SECTION: 21 - Check installation of 'TDPO' software
   SECTION: 22 - Check 'Smart' relay host to 'DSmailgw2.zzz.com
   SECTION: 23 - Check 'Patrol' installation and configuration for server '' XML-File
   SECTION: 24 - Check 'UC4' configuration
   SECTION: 25 - Check if the Services for 'vasd', vasgpd' and 'quest-ssh' are started
   SECTION: 26 - Check 'syslog' configuration
   SECTION: 27 - Check script File '/etc/motd'
   SECTION: 28 - Check 'os-als-role2server' if Server '' included'

   Please enter a Section No. 1 - 28 : 

Some of the sections have more depth as well, in this script there are around 6000 lines. Is this the type of thing that you are looking for?

Regards

Dave

@Dave
Yes, this is exactly what I am looking for.

@Coronoa...thank you again.
The log file is created no matter what , whether the command is successful or not.
How do I tell the program to resume from say "file 5" ?

As an idea you could try:

tail logfile

Which will show the last few entries within mentioned logfile, and parse that given output, and then act accordingly.

hth

That's what the 'touch' is for in the makefile, to create flag files on success to remember where it left off.

If you want to do it without a makefile:

#!/bin/sh

if [ "$#" -gt 0 ]
then
        X="$1"
elif [ -f lastfile ]
then
        read X < lastfile
fi

[ -z "$X" ] && X=1

R=1

while [ "$R" -eq 1 ]
do
        case "$X" in
        1)      command1 ;;
        2)      command2 ;;
        3)      command3 ;;
        4)      command4 ;;
        5)      command5 ;;
        *)      R=0 ;;
        esac

        if [ "$?" -ne 0 ]
        then
                echo "Error in loop $X" >&2
                exit 1
        fi

        let X=X+1
        # Save last-succeeded-operation + 1
        echo $X > lastfile
done

echo "Completed all loops, removing flag file"
rm -f lastfile

Then you could do ./script and it would remember where it left off, or ./script 2 to force it.

2 Likes

Hi,

OK, I can't let you have the whole script - but below is a typical section from the script. Please bear in mind that this is a singlr section of the script and that there are many such sections.

# ----------------------------------------------------------------------------------------------------
# check filesystems if not exists then create it"
# ----------------------------------------------------------------------------------------------------
#
#   STEPTOINFO[  4 ]=0 # check filesystems if not exists then create it
    STEPCOUNT=4
    if [ ${STEPTOINFO[$STEPCOUNT]} -ne 0 ]; then
#   {
        printf "$STEPCOUNT" >  "$SAVE_LAST_BREAK"    # save actally position
        printf "$STEPCOUNT; " >> "$SAVE_LAST_ACTIVITY"
#
        if [ "$GLOBAL_ZONE" = "$TRUE" ]; then
#       {
# ----------------------------------------------------------------------------------------------------
#
            ANZINFORMATION=$((ANZINFORMATION + 1))
            INSTALLINFORMATION[$ANZINFORMATION]="SECTION -  %2d : ${SECTION_INFO[$STEPCOUNT]}"
#
            showheader "${INSTALLINFORMATION[$ANZINFORMATION]}" "$STEPCOUNT" "$BACKGROUND"
#
            ERRORFOUND=0
            ERRORINFORMATION[$ANZINFORMATION]="Perfect"
#
# ----------------------------------------------------------------------------------------------------
# needed Filesystems:
#
            ZFS_RPOOL_FILESYS="rpool/UC4                                                            \
                               rpool/export rpool/export/home                                       \
                               rpool/local  rpool/local/Tivoli rpool/local/bmc rpool/local/oracle   \
                               rpool/zzz    rpool/zzz/core                                          \
                               rpool/oracle                                                         \
                               rpool/TAD4D"
# save this info in an array
#
            i=-1
            for ZFS_RP_FS in $ZFS_RPOOL_FILESYS
            do
                i=$((i + 1)); ZFS_RPOOLFS[$i]=$ZFS_RP_FS
            done
#
# ----------------------------------------------------------------------------------------------------
# needed Mountpoint for zfs filesystems:
#
            i=-1
            ZFS_FILESYSTEMS="/UC4                                                                \
                             /export     /export/home                                            \
                             /usr/local  /usr/local/Tivoli  /usr/local/bmc /usr/local/oracle     \
                             /zzz        /zzz/core                                               \
                             /oracle                                                             \
                             /opt/TAD4D"
# save this info in an array
#
            i=-1
            for ZFS_FS in $ZFS_FILESYSTEMS
            do

                i=$((i + 1)); ZFS_RPOOLFS[$i]=$ZFS_RP_FS
            done
#
# ----------------------------------------------------------------------------------------------------
# needed Mountpoint for zfs filesystems:
#
            i=-1
            ZFS_FILESYSTEMS="/UC4                                                                \
                             /export     /export/home                                            \
                             /usr/local  /usr/local/Tivoli  /usr/local/bmc /usr/local/oracle     \
                             /zzz        /zzz/core                                               \
                             /oracle                                                             \
                             /opt/TAD4D"
# save this info in an array
#
            i=-1
            for ZFS_FS in $ZFS_FILESYSTEMS
            do
                i=$((i + 1)); ZFS_FILESYS[$i]=$ZFS_FS
            done
#
# ----------------------------------------------------------------------------------------------------
# needed quotas for zfs filesystems:
#
            ZFS_FILESYS_QUOTAS="250M                                                              \
                                  3G          2G                                                  \
                                  3G          262M               520M            520M             \
                                  2G          1G                                                  \
                                 10G                                                              \
                                250M"
# save this info in an array
#
            i=-1
            for ZFS_FSQ in $ZFS_FILESYS_QUOTAS
            do
                i=$((i + 1)); ZFS_FSQUOTAS[$i]=$ZFS_FSQ
            done
#
# ----------------------------------------------------------------------------------------------------
#
            i=-1
            while [ $i -lt $(( ${#ZFS_FILESYS[*]} - 1)) ]
            do
#           {
                i=$((i + 1))
                typeset -L20 TMP_ZFS_RPOOLFS
                typeset -L20 TMP_ZFS_FILESYS
                TMP_ZFS_RPOOLFS="'${ZFS_RPOOLFS[$i]}'"
                TMP_ZFS_FILESYS="'${ZFS_FILESYS[$i]}'"
#
                /usr/sbin/zfs list | grep "${ZFS_RPOOLFS[$i]}" > /dev/null 2>&1
                RC=$?
                if [ $RC -eq 0 ]; then
#               {
#           {
                i=$((i + 1))
                typeset -L20 TMP_ZFS_RPOOLFS
                typeset -L20 TMP_ZFS_FILESYS
                TMP_ZFS_RPOOLFS="'${ZFS_RPOOLFS[$i]}'"
                TMP_ZFS_FILESYS="'${ZFS_FILESYS[$i]}'"
#
                /usr/sbin/zfs list | grep "${ZFS_RPOOLFS[$i]}" > /dev/null 2>&1
                RC=$?
                if [ $RC -eq 0 ]; then
#               {
                    if [ -d "${ZFS_FILESYS[$i]}" ]; then
#                   {
# >> zfs rpool/fs is available !!!
# >> directory    is available !!!
                        showmessages "...     NOTE: $TMP_ZFS_RPOOLFS and $TMP_ZFS_FILESYS exists" "Perfect" "$BACKGROUND"
#                   }
                    else
#                   {
                        showmessages "...     NOTE: $TMP_ZFS_FILESYS not exists" "Faulty" "$BACKGROUND"
                        ERRORFOUND=1
#                   }
                    fi
#               }
                else
#               {
# >> zfs rpool/fs is NOT available !!!
                    showmessages "...     NOTE: $TMP_ZFS_RPOOLFS not exists" "Faulty" "$BACKGROUND"
                    if [ ! -d "${ZFS_FILESYS[$i]}" ]; then
#                   {
# >> directory is NOT available !!!"
                        showmessages "...     NOTE: $TMP_ZFS_FILESYS not exists" "Faulty" "$BACKGROUND"
                        ERRORFOUND=1
#                   }
                    else
#                   {
                        showmessages "...     NOTE: $TMP_ZFS_FILESYS is a directory not a ZFS-Filesystem from $TMP_ZFS_RPOOLFS" \
                                     "Faulty"                                                                                   \
                                     "$BACKGROUND"
                        ERRORFOUND=1
#                   }
                    fi
#               }
                fi
#           }
            done
#
# --------------------------------------------------------------------------------------------------------
#
        if [ $ERRORFOUND -ne 0 ]; then
            GLOBAL_ERROR=-1
            ERRORINFORMATION[$ANZINFORMATION]="Faulty"
        fi
        fi
#
# ----------------------------------------------------------------------------------------------------
#
            if [ $RETURNNEEDED -eq 0 ]; then
                printf "\nenter <RETURN> to continue : "; read FORWARD
            fi
#       }
        fi
#   }
    fi

You should be able to see how the checking is done in each section from this.

Regards

Dave

@ Corona688: there's a missing closing quote if [ "$? -ne 0 ]
But nice one otherwise, i've tried a similar one with just a loop, but since the loop gone through all things, the error was overwritten and not reportet.. i'd had expected seomthing similar for a case statement..
As in, the case got catched and therefor would return true... thx for clearing this wrong assumption :slight_smile:

Thanks, fixed.

The trick is to save the next number after every success; otherwise, don't bother saving. Otherwise, you will be off-by-one...

Also, don't put anything but the program line in the cases, one per case. If you put anything else, it will pollute the value of $?. If you want any logging/etc, put it above the case or after the save.

Hi,

The script, will save the step count at the beginning of each step. When you run the script a second or subsequent time, the script will check the section to start at - as it writes the count at the beginning of each section. The script will start at the section that has failed.

Regards

Dave

1 Like