Dear all
I have created a bash script that cd at different subfolders and perform an action (sbatch)
# Let's loop over all subfolders
for idx in 100 200 300 400 500 600 700 800 900 1000;
do
# Let's copy, paste, and rename the relevant folder
if (("$idx"<1000));
then
cd "$idx-mT";
else
cd "1-T";
fi
for Geometry in Rectangular-Sample Square-Sample;
do
cd "$Geometry";
for Temperature in 0-K 2-K;
do
cd "$Temperature";
for Dipolar in Dipolar-Hierarchical Dipolar-Tensorial;
do
cd "$Dipolar";
for DMI in Scaled-DMI Unscaled-DMI;
do
cd "$DMI";
for DMI_Value in D3-D1-1 D3-D1-1with2 D3-D1-1with4 D3-D1-1with6 D3-D1-1with8 D3-D1-2 D3-D1-2with2 D3-D1-2with4 D3-D1-2with6 D3-D1-2with8 D3-D1-3;
do
cd "$DMI_Value";
if [ -f CrSBr-Field-Cooling.slurm ];
then
if [ ! -f slurm-* ];
then
sbatch CrSBr-Field-Cooling.slurm;
fi
else
mv CrSBr* CrSBr-Field-Cooling.slurm;
sbatch CrSBr-Field-Cooling.slurm;
fi
cd ..
done
cd ..
done
cd ..
done
cd ..
done
cd ..
done
cd ..
done
which in principle should only do the sbatch if a slurm* file does not exists in that subfolder (not sure yet that it does so).
In any case, I am interested in the following. When I launch this script, and the sbatch order gets in, a message like
Submitted batch job 8359814
which confirms that my job has submitted to the queue of the cluster. However, the queue has a limit capacity, and when that it is reached, the following message appears in the command window
sbatch: error: QOSMaxSubmitJobPerUserLimit
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
I would be interested in, somehow, to (i) stop the sbatch actions once the the sbatch error appears for the first time, (ii) remember at which subfolder it was unable to obtain a successful sbatch, and (iii) resumes from that subfolder the sbatch action afterwards the number of jobs in the queue is reduced.
Suggestions are welcome on how to achieve this!