Parallel bash scripts

Need some help to replace bash script with parallel to speed up job on multiple files (400files.list is the file contains the absolute path to those 400 files). The bash script is to run the same program over the files repetitively.
My bash_script.sh is:

for sample in `cat 400files.list`; do
sample=$(basename ${i});  
I_DIR=$(dirname ${i});  
O_DIR=$(dirname ${i});

${PROGRAM} \
--readFilesIn ${I_DIR}/${sample} \
--outFileNamePrefix ${O_DIR}/${sample}.bam \
--runThreadN 4 \
--genomeDir ${GENOME_DIR} \
>  ${LOG_DIR}/${sample}.out \
2> ${LOG_DIR}/${sample}.err

done

Parallel fits into this job. From gnu.org parallel manual I read:

which is related to my previous post. I was thinking something like

parallel -j 24 my_bash_script_{}.sh ::: 400files.list

but I have two issues here:

  1. there will be ~400 .sh files, which seems not correct obviously;
  2. my script contains multiple lines of a single job with other variables embedded such as I_DIR, O_DIR, GENOME_DIR and LOG_DIR.

I'm quite lost in my mind how to replace the bash script with parallel to speed up the job.
Thanks in advance!

---------- Post updated at 06:32 PM ---------- Previous update was at 03:48 PM ----------

Just tried one way myself, but not sure it can be optimized:

i=$1; 
sample=$(basename ${i});  
I_DIR=$(dirname ${i});  
O_DIR=$(dirname ${i});

${PROGRAM} \
--readFilesIn ${I_DIR}/${sample} \
--outFileNamePrefix ${O_DIR}/${sample}.bam \
--runThreadN 4 \
--genomeDir ${GENOME_DIR} \
>  ${LOG_DIR}/${sample}.out \
2> ${LOG_DIR}/${sample}.err

Then change the permission of the script to be executable. Run as:

cat 400files.list | parallel -j 24 ./my_script.sh

Thanks for any suggestion on parallel and bash script.

First, get rid of cat :

parallel -j 24 ./my_script.sh < 400files.list

Then, if every line in 400files.list contains a "/" character, speed up:

sample=$(basename ${i});  
I_DIR=$(dirname ${i});  
O_DIR=$(dirname ${i});

considerably by changing the command expansions invoking basename and dirname to variable expansions of the form:

sample=${i##*/}
I_DIR=${i%/*}

and get rid of the duplicate computations:

O_DIR=$I_DIR
1 Like

I am glad I did not make too much mistake here!
Thanks Don for:

  1. I was not sure redirection "<" for parallel while reading the manual;
  2. changing the command expansions to variable expansions plays lots tricks here.
  3. O_DIR=$I_DIR is a small bug, as they can be different so I keep it at this moment.

< works for anything which reads from stdin, the only "magic" things which don't are password prompts on terminal logins / sudo / ssh and the like, which reject non-terminals on purpose.