Parallel processing in bash?

jamie_123 · June 14, 2012, 9:19am

Hi

Say I am interested in processing a big data set over shell, and each process individually takes a long time, but many such processes can be pipe-lined, is there a way to do this automatically or efficiently in shell?

For example consider pinging a list addresses upto 5 times each. Of course I can be more efficient if i can run many pings parallel. Other than splitting input per shell manually, is there an automatic way?

jim_mcnamara · June 14, 2012, 9:40am

pseudocode

cnt=0;
while read somedata
do
  cnt=$((  $cnt + 1 ))
  somecommand  $somedata >> mylogfile &
  if [[ $(( $cnt % 10 )) -eq 0 ]; then   # run 10 at the same time
    wait
  fi
done  < file_with_somedata.dat
wait      # catch the processes that we did not wait for earlier

jamie_123 · June 14, 2012, 10:40am

Hi Jim,

That seems to do the trick in most cases. say I have ping in the place of somecommand, it works perfectly, but when I put another script there, it seems to fail totally, all the outputs seem to have the same input.

Any idea why this may happen??

Thanks a lot..

Corona688 · June 14, 2012, 11:14am

It would really depend what the script is, what it does, and how you're running it...

jamie_123 · June 14, 2012, 11:28am

Hi

Managed to solve it, I was redirecting outputs to the same tmp file n processing from that.

Thanks guys!

Corona688 · June 14, 2012, 11:31am

That's quite possibly a bad idea. They may not necessarily be guaranteed to do so cleanly.