Multi threading - running multiple processes at the same time

SkySmart · July 31, 2016, 5:58pm

so i've been using this a lot in a lot of my scripts:

( columnA & columnAPID=$! & columnB & columnBPID=$! & columnC & columnCPID=$! &) &
wait ${columnAPID}
wait ${columnBPID}
wait ${columnCPID}

It seems to work as ive seen it dramatically reduce run time of my scripts.

however, i'm wondering if there's another way to do the above? a better way, maybe? one that' is of course portable and will be faster??

i'm using /bin/sh

Don_Cragun · July 31, 2016, 7:31pm

skysmart:

so i've been using this a lot in a lot of my scripts:
( columnA & columnAPID=$! & columnB & columnBPID=$! & columnC & columnCPID=$! &) &
wait ${columnAPID}
wait ${columnBPID}
wait ${columnCPID}
It seems to work as ive seen it dramatically reduce run time of my scripts.

however, i'm wondering if there's another way to do the above? a better way, maybe? one that' is of course portable and will be faster??

i'm using /bin/sh

You're using an unneeded subshell and running several background jobs that do not need to be in the background. Your script could more simply (and slightly more efficiently) be written as:

columnA & columnAPID=$!
columnB & columnBPID=$!
columnC & columnCPID=$!
wait ${columnAPID}
wait ${columnBPID}
wait ${columnCPID}

or if you don't have any other background jobs and aren't going to check the exit status of each individual wait command to verify that all three jobs completed successfully:

columnA &
columnB &
columnC &
wait

bakunin · July 31, 2016, 11:13pm

As the title of this thread suggests you confuse multithreading and multitasking.

The difference between a thread and a task is that a task is a process in its own rights: complete with its own process environment, its own stack, its own text segments, data segments, and so on. A thread has nothing of that. It is just a separate succession of machine statements sharing the process environment with all the other threads which make up this certain process.

Multitasking is having several processes run at once. Multithreading means that one process has several threads (usually utilizing a CPU specifically designed to support that).

What you do here is multitasking. You start several processes (even the same executable started several times will create separate processes) and if these processes are multithreaded depends on the exectuable.

I hope this helps.

bakunin

SkySmart · August 1, 2016, 12:14am

bakunin:

As the title of this thread suggests you confuse multithreading and multitasking.

The difference between a thread and a task is that a task is a process in its own rights: complete with its own process environment, its own stack, its own text segments, data segments, and so on. A thread has nothing of that. It is just a separate succession of machine statements sharing the process environment with all the other threads which make up this certain process.

Multitasking is having several processes run at once. Multithreading means that one process has several threads (usually utilizing a CPU specifically designed to support that).

What you do here is multitasking. You start several processes (even the same executable started several times will create separate processes) and if these processes are multithreaded depends on the exectuable.

I hope this helps.

bakunin

thank you for this. any additional ideas on how to make what i have in my original post more efficient?

i read somewhere that i can "pipe" the functions, but i haven't seen an example on how to do this.

Don_Cragun · August 1, 2016, 3:05am

Let me see if I understand this. You have three scripts that do something or nothing and they do whatever they do faster if you run them in parallel than if you run them serially. And, you want us to tell you how to convert those three scripts into functions that can be piped together to do something or nothing faster. If these three scripts run faster in parallel instead of serially, why would connecting them with pipes make them run faster? Connecting three scripts with pipes implies that the output of one of the scripts is input to one of the other scripts and the output of that script is input to the third script; but if you are running them in parallel, that can't be true.

Don't you think we might have a better chance of helping you figure out if there is a way to optimize what your three scripts are doing if we knew something about what those three scripts are doing?

bakunin · August 1, 2016, 6:44am

I do not know what you mean by "pipe functions".

What i can suggest, though, is to make the parallelisation scheme more flexible, but if this is possible at all depends on the exact problem you are working on and how much parallelisation it allows for.

Suppose you have some "stream" of values you want to work on coming from somewhere, like in the following loop "VAL" is fed consecutive values:

while read VAL ; do
     some_command "$VAL"
done

To work on several values of VAL at the same time we first need to define a fanout value: it only makes sense to have so many parallel instances of some_command running at the same time. How many exactly depends on your system, the nature of some_command , the nature of your input data (the values of VAL) and perhaps a few other things but there will be *some* value whic is optimal. Let us call this value FANOUT. Here is what we are going to do, first in pseudocode:

while less processes than FANOUT
     if input left
          start_new_process with next-input
     else
          exit loop
     fi
end while
wait <for all sub-processes ending>

here is a sketch for a solution in shell (ksh actually):

typeset -i iFanOut=15     # replace this with a sensible value
typeset chVal=""

read chVal
while : ; do
     if [ $(jobs -p | wc -l) -le $iFanOut ] ; then
          some_command "$chVal" &
          if ! read chVal ; then
               break
          fi
    fi
    sleep 5                  # replace with a sensible value
done
wait ....

It should be feasible to pack this into a generic function and call that with the command to execute as parameters. Something like this (not tested):

function multitask
{
typeset -i iFanOut=15     # replace this with a sensible value
typeset chCmd="$1"
typeset chVal=""

if ! read chVal ; then
     return
fi
while : ; do
     if [ $(jobs -p | wc -l) -le $iFanOut ] ; then
          $chCmd "$chVal" &
          if ! read chVal ; then
               break
          fi
    fi
    sleep 5                  # replace with a sensible value
done
wait

return 0
}


# main ()
produce_list | multitask "some_command"

Save for that i share Don Craguns astute observations: if you show us "something like" your problem you are likely to get "something like" an answer, most probably producing only "something like" the desired result instead of the real thing.

I hope this helps.

bakunin