Need some help to replace bash script with parallel to speed up job on multiple files (400files.list is the file contains the absolute path to those 400 files). The bash script is to run the same program over the files repetitively.
My bash_script.sh is:
for sample in `cat 400files.list`; do
sample=$(basename ${i});
I_DIR=$(dirname ${i});
O_DIR=$(dirname ${i});
${PROGRAM} \
--readFilesIn ${I_DIR}/${sample} \
--outFileNamePrefix ${O_DIR}/${sample}.bam \
--runThreadN 4 \
--genomeDir ${GENOME_DIR} \
> ${LOG_DIR}/${sample}.out \
2> ${LOG_DIR}/${sample}.err
done
Parallel fits into this job. From gnu.org parallel manual I read:
which is related to my previous post. I was thinking something like
parallel -j 24 my_bash_script_{}.sh ::: 400files.list
but I have two issues here:
- there will be ~400 .sh files, which seems not correct obviously;
- my script contains multiple lines of a single job with other variables embedded such as I_DIR, O_DIR, GENOME_DIR and LOG_DIR.
I'm quite lost in my mind how to replace the bash script with parallel to speed up the job.
Thanks in advance!
---------- Post updated at 06:32 PM ---------- Previous update was at 03:48 PM ----------
Just tried one way myself, but not sure it can be optimized:
i=$1;
sample=$(basename ${i});
I_DIR=$(dirname ${i});
O_DIR=$(dirname ${i});
${PROGRAM} \
--readFilesIn ${I_DIR}/${sample} \
--outFileNamePrefix ${O_DIR}/${sample}.bam \
--runThreadN 4 \
--genomeDir ${GENOME_DIR} \
> ${LOG_DIR}/${sample}.out \
2> ${LOG_DIR}/${sample}.err
Then change the permission of the script to be executable. Run as:
cat 400files.list | parallel -j 24 ./my_script.sh
Thanks for any suggestion on parallel and bash script.