What do you mean with "hangs" ? Do you mean the timeout that occurs when a server is unreachable or down?
Are you referring to a shell script loop from within which ssh commands are used to perform remote tasks? If so you could try something like:
ssh -o ConnectTimeout=2
From man ssh_config:
ConnectTimeout
Specifies the timeout (in seconds) used when connecting to the SSH server, instead of using the default system TCP timeout. This value is used only when the target is down or
really unreachable, not when it refuses the connection.
Note that, like it says, this timeout will not work with hosts that refuse connection..
Without knowing any actual details of what your script does or contains, it's hard to give you a definitively correct answer here. But in terms of a general principle, you could process each IP in a loop, run your script for each IP in the background, then wait a number of seconds before proceeding to the next one in the loop. That way at least you would be able to continue with each IP in the list.
So for example, something like this:
for ip in `cat ip-list.txt`
do
./script.sh "$ip" &
sleep 300
done
Now I'm making a great deal of assumptions here, since you haven't given us any actual code of your own or any details about what precisely you're trying to do to each IP. But the above code fragment would iterate through every IP address in the file ip-list.txt and run the external script ./script.sh on it in the background. It would then pause for five minutes (300 seconds), and proceed regardless of the outcome with the next one in the list.
There are many potential problems with this approach, but this is about as generic a solution as I can suggest without anything detailed to actually go on. Hope this helps.
I am actually look at ways to restart a process if it hangs.
The line I highlighted in red sometime work, it will continue to the next line.
If it doesn't work,I would expect a way to restart that line first before proceeding to the next line.
Hope you can advise.
while read ip; do
echo -e "${BLUE}[+]${RESET} Scanning $ip for $proto ports..."
# unicornscan identifies all open TCP ports
if [[ $proto == "tcp" || $proto == "all" ]]; then
echo -e "${BLUE}[+]${RESET} Obtaining all open TCP ports using unicornscan..."
echo -e "${BLUE}[+]${RESET} unicornscan -i ${iface} -r20000 -mT ${ip}:a -l ${log_dir}/udir/${ip}-tcp.txt"
unicornscan -i ${iface} -mT ${ip}:a -r20000 -l ${log_dir}/udir/${ip}-tcp.txt
ports=$(cat "${log_dir}/udir/${ip}-tcp.txt" | grep open | cut -d"[" -f2 | cut -d"]" -f1 | sed 's/ //g' | tr '\n' ',')
if [[ ! -z $ports ]]; then
# nmap follows up
echo -e "${GREEN}
[*]${RESET} TCP ports for nmap to scan: $ports"
echo -e "${BLUE}[+]${RESET} nmap -e ${iface} ${nmap_opt} -oA ${log_dir}/ndir/${ip}-tcp -p ${ports} ${ip}"
nmap -e ${iface} ${nmap_opt} -oA ${log_dir}/ndir/${ip}-tcp -p ${ports} ${ip}
else
echo -e "${RED}[!]${RESET} No TCP ports found"
fi
fi
Rather than waiting for something to finish before starting the next task, I find that pdsh performing remote tasks in parallel is most useful for our situation. There is a timeout option (along with many other options).
Some details for pdsh (which calls pdsh.bin ):
pdsh issue commands to groups of hosts in parallel (man)
Path : /usr/bin/pdsh
Version : -2.31 (+debug)
Length : 15 lines
Type : Bourne-Again shell script, ASCII text executable
Shebang : #! /bin/bash
Repo : Debian 8.9 (jessie)
Home : https://computing.llnl.gov/linux/pdsh.html (pm)
See man pdsh , and note that there are several alternate codes that may be considered as a result of searching for alternatives to pdsh -- for example see:
The problem is: we do not really know what "doesn't work" means. If it is that the quoted line just hangs and doesn't finish: start it in the background and have a wait command at the end collecting all the hanging processes. There are a lot of threads here dealing with exactly this problem.
If you mean by "doesn't work" that the process just comes back unsuccessfully: usually a process has a return code. You can query this return code and re-run the process if it is not zero (0 usually means it was successful and everything else some sort of failure).
Replace the quoted line with something like this:
MAXRETRIES=<some number> # define this at the beginning globally
...
(( iCnt = MAXRETRIES ))
while ! unicornscan -i ${iface} -mT ${ip}:a -r20000 -l ${log_dir}/udir/${ip}-tcp.txt && [ $iCnt -gt 0 ] ; do
(( iCnt -= 1 ))
done
This will try MAXRETRIES times to execute the code until it is either successful or the number of tries run out.
I was actually thinking of 3 conditions:
1.Move on to the next line if doesn't work or hang
2. Restart until it works and move to the next line
3. Wait for a number of seconds , if it doesn't move to the next line, restart the current line (just in case for whatever reason there is no exit code)
Unfortunately for my case it still doesn't work after applying the code above. It still hangs
Send exiting main didnt connect, exiting: system error Interrupted system call
Recv exiting main didnt connect, exiting: system error Interrupted system call
That doesn't sound like "hanging" (no more activities nor reactions) but more like exiting with an error. It would be very surprising if NO error resp. exit code were given indicating what error occurred.
Definitely more info is necessary here.
Same for your three conditions. What in bakunin's proposal doesn't solve your problem? Please be way more informative!
This is one way on how you trap a too long wait for a command.
set up a child process that kills the parent after a pre-set time.
run the command
clean up child
#!/bin/bash
# sleep for a while then clobber parent
# 30 is the value for the signal SIGUSR1 on my system
# SIGUSR1 is a signal that the system does not care about at all, you use it locally
naptime() {
sleep 10 # take a nap
kill -n 30 $PPID # wake the parent
}
run_ssh()
{
trap 'echo "ssh took too long"; return 1' SIGUSR1 # return an error
naptime &
naptime_pid=$!
ssh myuser@somewhere.com 'ls myfile.txt' # you better be sure this command will complete on success in less than 10 seconds
kill $naptime_pid
return 0 # no error
}
# -------- main
run_ssh # will run for 10 seconds max
ssh_rc=$?
[ $ssh_rc -q 0 ] && echo "things went fine" || echo "oops ssh timout error"
Also, I would suggest using ping to start then call ssh if things went okay in terms of being able to the the remote box. ping has a default timeout setting.
Example:
Actually I am ok with bakunin's proposal.
Just that it still doesn't work. The line just freezes there an does not proceed to the next iteration? I have been waiting for the code to execute as when it is possible.