Hello,
I have to admit, I've never really used the parallel
command before, so I've been playing about this morning for a while to see if I can replicate the issue. And I think I might know what's going on here.
So, to test, I used a command very similar to your own, specifically this:
/usr/bin/parallel --no-run-if-empty --will-cite --onall --keep-order --jobs 0 'ssh {} "date ; sleep 10"' ::: host1 host2 host3 host4
And initially, using that command, I do indeed see the commands starting one after the other, like you observe yourself:
$ time /usr/bin/parallel --no-run-if-empty --will-cite --onall --keep-order --jobs 0 'ssh {} "date ; sleep 10"' ::: host1 host2 host3 host4
Wed Aug 23 07:34:38 BST 2023
Wed Aug 23 07:34:48 BST 2023
Wed Aug 23 07:34:58 BST 2023
Wed Aug 23 07:35:08 BST 2023
real 0m41.452s
user 0m0.429s
sys 0m0.157s
$
Here, we can clearly see that each command started ten seconds after the other - in other words, parallel
only executed the next command after the previous command had finished. Otherwise, what we'd expect to see is that the output of date
would be identical (give or take), since the date
command would have run near-simultaneously on each host.
And just in case we were in any doubt, my use of time
here to show the total execution time clearly shows this entire command taking 40 seconds - exactly what we'd expect if it ran them all serially, and not the 10 seconds we'd expect if they were truly being run in parallel.
So I played about with the options to parallel
for a while, and I think I've found the culprit. Let's see what we get if we omit the --onall
flag:
$ time /usr/bin/parallel --no-run-if-empty --will-cite --keep-order --jobs 0 'ssh {} "date ; sleep 10"' ::: host1 host host3 host4
Wed Aug 23 07:42:00 BST 2023
Wed Aug 23 07:42:00 BST 2023
Wed Aug 23 07:42:00 BST 2023
Wed Aug 23 07:42:00 BST 2023
real 0m10.434s
user 0m0.217s
sys 0m0.095s
$
And there we go - true near-parallel execution happening there. So if you could please try the same and let us know the outcome, hopefully that will do the trick for you.
As to why that flag causes this serial rather than parallel execution: let's see what the man
page for parallel
has to say about this flag:
--onall (beta testing)
Run all the jobs on all computers given with --sshlogin. GNU parallel will log into --jobs number of computers in parallel and run one job at
a time on the computer. The order of the jobs will not be changed, but some computers may finish before others.
When using --group the output will be grouped by each server, so all the output from one server will be grouped together.
--joblog will contain an entry for each job on each server, so there will be several job sequence 1.
So I think what's happening here is that it's effectively only running one SSH at a time when the --onall
flag is used, since the command we're trying to parallelise already uses ssh
itself. So when we omit this flag, it does the connections in parallel like we'd expect, since it's the ssh
in the command we're parallelising that then takes priority.
As an aside: is there a reason you don't want to use something simpler and solely shell-driven, like (for example) this:
#!/bin/bash
for host in host1 host2 host3 host4
do
/usr/bin/ssh "$host" "/bin/date ; /usr/bin/sleep 10" &
done
wait
This script executes the SSH sessions to the four hosts in parallel (since it backgrounds each ssh
command when it runs it), but still waits for all child processes to completely exit before the script itself exits (the meaning of the wait
command at the end).
Anyway, hope at least some of this helps ! If you could let us know how you get on, then we can take things from there.