Help with script to fully occupied all available cpu

I have long list of file wanna processed by a program:

data_1.txt
data_2.txt
data_3.txt
data_4.txt
data_5.txt
data_6.txt
data_7.txt
data_8.txt
.
.
data_1_2.txt
data_2_2.txt
 data_3_2.txt
 data_4_2.txt
 data_5_2.txt
  data_6_2.txt
  .

Bash script that I run to auto the progress:

for f in *.txt
do
./progran_name $f > $f.out
done

My server have 16 cpu. My purpose is plan to write a script that able to utilize all the cpu instead of utilize only 1 cpu to run the progress. After first 16 files is completed, it will automatic run through the next 16 files, etc.
I got try to edit my bash script by adding "&":

for f in *.txt
do
./progran_name $f > $f.out &
done

Unfortunately, the above script will utilize the cpu based on total number of my *.txt. This is not advice since over cpu utilized will end up stuck my server :frowning:
Thanks for any advice to solve my doubts.

Perhaps GNU Parallel would be of some use to you. If you are interested in it (i.e., it seems promising), there is a video (part one of two) about it which you can watch on YouTube.

2 Likes

Try:

for f1 f2 f3 f4 in *.txt; do
  ./prog $f1 > $f1.out &
  ...
  ./prog $f4 > $f4.out &
  wait
done

But you really need GNU Parallel

1 Like

Hi yazu,

When I trying your bash script, it shown the below error message:

line 1: syntax error near unexpected token `f2'

This is the list of my available input file:

data_1.txt
data_2.txt
data_3.txt
data_4.txt

Script that I edit based on your advice:

for f1 f2 f3 f4 in *.txt; do
  ./prog $f1 > $f1.out &
  ./prog $f2 > $f2.out &
  ./prog $f3 > $f3.out &
  ./prog $f4 > $f4.out &
  wait
done

Thanks for your advice.

for f in *.txt; do
 ./prog $f > $f.out & 
 wait
done

--ahamed

1 Like

Sorry. I use zsh and I had checked it - it worked. But now I've checked it for bash and ksh - it doesn't.

1 Like

Hi yazu,

Is ok about that.
Still thanks for your assist :slight_smile:
You can update me anytime once you have better idea to write a script to utilize all the cpu in server for running a progress.
Thanks first!

Hi!
The best idea (what I can imagine) is to get GNU Parallel. Especially for 16-core monsters. :slight_smile: I played with it a little - it's easy to use and you can say it how many jobs it should execute. But no one of my boxes have more than 2 cores (((.

find . -name '*.txt' | parallel -j4 ./prog

It works (here) as xargs.
And don't confuse GNU parallel with parallel from "moreutils".

1 Like

Hi yazu,

How to check that whether we already have install GNU parallel in server?
Thanks for guide.

No, you definitely don't have it - it's too fresh. You should compile it. But it has not to have root privileges and you can install it in your $HOME/bin. The link - GNU Parallel - GNU Project - Free Software Foundation

But I really used 0install (for debian based, ubuntu in my case):

sudo apt-get install zeroinstall-injector
0alias parallel http://git.savannah.gnu.org/cgit/parallel.git/plain/packager/0install/parallel.xml
1 Like

thanks, yazu.
Do you have any idea regarding following error message:

/bin/sh: ./sem: Permission denied

I'm not a root. Just an user to install it.

Ok. I don't understand at once that this question about parallel. I'll look at it.

---------- Post updated at 06:50 PM ---------- Previous update was at 06:42 PM ----------

Well I downloaded the latest tarball, and (not as root, of course) :

./configure --prefix=$HOME/opt
make 
make install
PATH=$HOME/opt/bin:$PATH
parallel --version
  GNU parallel 20110822
seq 1 5 | parallel -j5 echo
1
2
4
3
5

Do not see any problems.

2 Likes

thanks yazu for assist.
Many appreciate.