Hi folks. I'm trying to get the following script working for rebooting a bunch of clients. Up to now I've been using PSSH, but when they all startup again at the same time I get a few mount problems. So, I'm trying to stagger the reboot command. I know reboot will depend on what's running at the time. According to everything I've found the code attached should work.
But this script exits after the first iteration. I'm guessing the rsh command loses connection without getting a return so it produces an error "closed by remote host" which isn't getting caught
Could someone please help me out, this is starting to drive me nuts! I could do the same in python, but then I'm not learning anything.
Thanks.
#!/bin/bash
set +e
cat /nodes/nodes-128 | while read LINE; do
echo "Attempting to reset - $LINE"
rsh pi@$LINE sudo reboot now || true
sleep .5
done
That is a useless use of cat, don't do that.
I suspect rsh is trying to read from standard input and eating all the following lines. ssh does that too. You can work around that by using a different file descriptor.
while read -u5 LINE
do
echo "Attempting to reset - $LINE" >&2
rsh "pi@$LINE" sudo reboot now || true
done 5< /nodes/nodes-128
Ok, well incorporating those couple of things (I'm still using cat. I like cat. I have one!) does make a difference, it now loops for two nodes, it resets at least one of them, them closes my ssh to the header machine.
Just before my session is closed the error tcserror: Input/output error is thrown.
I have since tried nohup and disown, these do similar, they run for a couple of loops then just end.
I should add at this point I've checked and checked, there is no problem with my nodes file. So I continue to be confused.
The problem surely is I want to explicitly ignore "Connection lost', not an error. Connection lost isn't an error, is a loss of connection, it's not a return from the command.
So, now I'm very confused.
EDIT: pssh manages fine which is a python script, so it must be possible with bash surely?
Using cat
requires a pipe, and this requires the loop to 1. run in a sub shell and 2. read from stdin (descriptor 1, default).
- A subshell is more overhead, and you cannot modify shell variables in the main shell.
- rsh (and ssh) read from stdin, that competes with a
read
from stdin. Work-arounds are: rsh -n ...
or rsh </dev/null ...
I think reboot
does not take arguments like now
, is misleading at least.
Because the connection might be dropped before the command finishes, it is safer to run it in the background wirh a little delay.
rsh -n remotehost "(sleep 1; reboot) &"
1 Like