As stated in the title, I do some hacked parallel processing by running multiple instances of bash scripts, each in their own subshell. The code looks like this,
# launch one batch-train script in background for each value in fold group list
for FOLD_GROUP in "${FOLD_GROUP_LIST[@]}"
do
# continue training on $FOLD_GROUP folds
./01_batch_script.sh $CORES \
$BATCH_PROJ \
$BATCH_STOP_ON_SUBSET \
$BATCH_STOP_ON_STAT \
$SET \
$FOLD_GROUP \
$RND_SEED \
$DATASET_STRING \
$ERRTOL \
$BATCHES \
$START_MODE \
$START_LR \
$MAX_EPOCH_BATCH \
$OA_PRINT_PRECISION \
$BATCH_PROC &
# to prevent terminal overrun
sleep 4
done
# wait for all subshells to return before resuming
wait
Despite the use of sleep to try and space things out, when I am working on these scripts I am writing allot of output and there is no way to keep the output from getting muddled when it is all going to one terminal.
I can generally work on just one instance if I need to, but it would also be nice to have each script report to its own log file so I can see which instance errors are associated with.
Is there a reasonable way to do this that someone can suggest?
01_batch_script.out.$bkjobnum is the name of the file where stdout will be logged and 01_batch_script.err.$bkjobnum is the name of the file where stderr will be logged for each background job. And $bkjobnum will be incremented for each background job.
If you use tee -a logfile.txt the output in logfile.txt and on the screen will be muddled as before.
I ended up using this version because it created fewer log files and I took this advice to use some additional data from my script to make up the log file name.
It would be nice to have some output to the terminal to follow progress, but it is easy enough to check the log files. The text editor that I use will reload the log file if there is a change.
This is what it looks like now,
# launch one batch-train script in background for each value in fold group list
for FOLD_GROUP in "${FOLD_GROUP_LIST[@]}"
do
# continue training on $FOLD_GROUP folds
./01_batch_script.sh $CORES \
$BATCH_PROJ \
$BATCH_STOP_ON_SUBSET \
$BATCH_STOP_ON_STAT \
$SET \
$FOLD_GROUP \
$RND_SEED \
$DATASET_STRING \
$ERRTOL \
$BATCHES \
$START_MODE \
$START_LR \
$MAX_EPOCH_BATCH \
$OA_PRINT_PRECISION \
$BATCH_PROC > 'logfile_'$SET'_fg'$FOLD_GROUP'_'$START_MODE'-'$START_LR'.txt' 2>&1 &
# to prevent terminal overrun
sleep 2
done
# wait for all subshells to return before resuming
wait