Adding a timeout when using sftp in a script?

Hello guys.

I need some help. First of all, sorry about my english, that is not my native languaje.

I have a bash script in Solaris with the next lines:

And the sftp.sh has this:

The problem is that sometimes the sftp takes a long of time. I mean, 2 or 3 hours when it should be no more than 1 minute. I think there is a problem in the network, but i do not care about that in this moment.

So, i need to "cancel" that sftp and continue to the next line. It does not matter if the stfp is not finished.

I mean, can i add some kind of timeout to the sftp? I.e. 5 minutes. If it takes more than 5 minutes, continue to the next line.

Is it possible?

Thanks in advance.
Regards!

Hi,

I'm not aware of a way of timing out a single file transfer and moving to the next, however I think I would attempt to resolve the overall time.

So you could make the logging more verbose and see if there is an issue, perhaps the "cd" is failing?

Try sftp -v in the shell or even sftp -v -v -v which is the maximum level of reporting and check the output for any errors.

In the event that the network has a problem, which you do suggest - involve the network team if you have one.

Regards

Gull04

Hello. That script runs every day in the morning, and the problem happens only a few times. Maybe 1 one time in the entire week.

I know we have to check where is the problem of the delay but in this moment i need to forget that and find a way to continue to the next line when the sftp is taking too long but i can�t find the solution.

Thanks anyway, you are very kind!

Hi,

A little bit of an update to add to this,

You can it would seem just use one of the ssh options, the change required would probably be straight forward - as follows;

sftp $USER@$HOST -o ConnectTimeout=nn -o BatchMode=yes << EOF

The Connect Timeout value in seconds, you should be able to investigate with any error codes.

Regards

Gull04

Thanks, but unfortunately it does not work.

And when i deletes the "-o ConnectTimeout=20 -o BatchMode=yes" it works. It seems that is only for the ssh command.

Thanks again.

sftp is ssh. But only a few implementations of ssh bother supporting that option.

Hi Sorry,

You can try using;

sftp -o ConnectTimeout=10 -o BatchMode=yes -o StrictHostKeyChecking=yes $USER@$HOST << EOF

This is working on one of my Solaris installations SunOS cathsunvs04 5.10 Generic_150400-46 sun4v sparc sun4v haven't tested anywhere else.

Regards

Gull04

Yes, you are right. Thanks.

Thanks a lot, it works splendidly. But i do not know if this is the solution.

I think that the "ConnectTimeout=10" property is only for connecting. I mean, if i want to connect to another server with sftp and i can�t connect after 10 seconds, it drops and it is fine.

But in my case, i can always connect to the remote server. The problem is when getting the files. Sometimes in the middle of the copy, it delays a lot (more than 1 hour). And in that case is where i want the script to cancel the sftp and to continue to the next line. Do you understand what I mean?

Thanks again!

here's something I played around a bit many maaaany moons ago.
This might inspire some additional ideas.
The general idea is to "spawn" your potentionally long running process(es) (child1 and child2) to the background and check it after so many secs: if active kill it and continue (killing the "watcher" as well).

#!/bin/ksh
#set -x

typeset -i childRun='50'

typeset -i childMax='10'

function child1 {
   sleep "${childRun}"
}
function child2 {
   sleep "${childRun}"
}


function main {
#  set -x
  print -u2 "starting child-> [$(date)]"
  child1&
  typeset childPid=$!

  (
      (
          sleep "${childMax}"&
          echo $!
          wait $! && {
              print -u2 "Killing hung child (pid=$childPid) ->[$(date)]"
              kill -9 $childPid
          }
      ) &
  ) | read timer_pid

  wait
  kill $timer_pid 2> /dev/null
}


main

Hope it helps.

Indeed. This is a very good general-purpose kill-if-hung script, but in this case we "know" which file is being transferred, no? So we could, instead of guessing if the transfer is hung or just taking a long time check if the file size does change (=transfer still in progress) and base the timeout on that, no?

I hope this helps.

bakunin

Correct, that would be preferable.
I just offered the "test harness" of an idea based on time as the OP stated:

Hi Vgersh;

I think I understand and what I said earlier is to do with the server, what you will need to do is reconfigure the configuration files for either ssh or sshd these are normally in /etc/ssh .

I think that what you are looking for is the "Client Alive Settings", from memory there are three that you may have to look at;

  • The tcp_keep_alive setting.
  • The ClientAliveMax setting.
  • The ClientAliveInterval setting.

From memory you will have to set these and then restart the daemon to allow the changes to take effect. Although I'm still thinking that you should be having your network people do some investigations.

Regards

Gull04

Thanks a lot.
I will try that.