Have each subshell write stderr and stdout to its own logfile

Hello,

As stated in the title, I do some hacked parallel processing by running multiple instances of bash scripts, each in their own subshell. The code looks like this,

# launch one batch-train script in background for each value in fold group list
for FOLD_GROUP in "${FOLD_GROUP_LIST[@]}"
do
   # continue training on $FOLD_GROUP folds
   ./01_batch_script.sh $CORES \
                        $BATCH_PROJ \
                        $BATCH_STOP_ON_SUBSET \
                        $BATCH_STOP_ON_STAT \
                        $SET \
                        $FOLD_GROUP \
                        $RND_SEED \
                        $DATASET_STRING \
                        $ERRTOL \
                        $BATCHES \
                        $START_MODE \
                        $START_LR \
                        $MAX_EPOCH_BATCH \
                        $OA_PRINT_PRECISION \
                        $BATCH_PROC &
   # to prevent terminal overrun
   sleep 4
done

# wait for all subshells to return before resuming
wait

Despite the use of sleep to try and space things out, when I am working on these scripts I am writing allot of output and there is no way to keep the output from getting muddled when it is all going to one terminal.

I can generally work on just one instance if I need to, but it would also be nice to have each script report to its own log file so I can see which instance errors are associated with.

Is there a reasonable way to do this that someone can suggest?

LMHmedchem

Before your loop add:

bkjobnum=1

In your loop, change:

                        $BATCH_PROC &
   # to prevent terminal overrun
   sleep 4

to:

                        $BATCH_PROC > 01_batch_script.out.$bkjobnum 2> 01_batch_script.err.$bkjobnum &
   # to prevent terminal overrun
   bkjobnum=$((bkjobnum + 1))

Then you will have the standard output and standard error output from each background job in separate files that you can peruse at your leisure.

1 Like

Thanks, that will be a big help.

Is,

01_batch_script.out.$bkjobnum

the name of the file where stdout will be logged?

Would I be able to so something comparable to,

2>&1 | tee -a logfile.txt

to allow output to both the terminal and a logfile?

LMHmedchem

01_batch_script.out.$bkjobnum is the name of the file where stdout will be logged and 01_batch_script.err.$bkjobnum is the name of the file where stderr will be logged for each background job. And $bkjobnum will be incremented for each background job.

If you use tee -a logfile.txt the output in logfile.txt and on the screen will be muddled as before.

It's sometimes handy to directly use the loop variable instead of a counter variable

./01_batch_script.sh ... > $FOLD_GROUP.out 2>&1 &

This one redirects both stderr and stdout.
I recommend to have at least a sleep 1 , to avoid load peaks (e.g. out-of-sockets).

1 Like

I ended up using this version because it created fewer log files and I took this advice to use some additional data from my script to make up the log file name.

It would be nice to have some output to the terminal to follow progress, but it is easy enough to check the log files. The text editor that I use will reload the log file if there is a change.

This is what it looks like now,

# launch one batch-train script in background for each value in fold group list
for FOLD_GROUP in "${FOLD_GROUP_LIST[@]}"
do
   # continue training on $FOLD_GROUP folds
   ./01_batch_script.sh $CORES \
                        $BATCH_PROJ \
                        $BATCH_STOP_ON_SUBSET \
                        $BATCH_STOP_ON_STAT \
                        $SET \
                        $FOLD_GROUP \
                        $RND_SEED \
                        $DATASET_STRING \
                        $ERRTOL \
                        $BATCHES \
                        $START_MODE \
                        $START_LR \
                        $MAX_EPOCH_BATCH \
                        $OA_PRINT_PRECISION \
                        $BATCH_PROC > 'logfile_'$SET'_fg'$FOLD_GROUP'_'$START_MODE'-'$START_LR'.txt' 2>&1 &
   # to prevent terminal overrun
   sleep 2
done

# wait for all subshells to return before resuming
wait

LMHmedchem