Solaris 9 - SSH 40 Second Delay

I'm having an issue with SSH on a server that hasn't had any configuration changes made on it in a long time. I SSH to the server and it hangs at "debug1: SSH2_MSG_KEXINIT sent" for exactly 40 seconds then connects fine after that pause. Everything I have found points to DNS, but I use host files for DNS and like I said it has worked for years and there hasn't been any changes, so I'm kind of at a loss... If anyone has any ideas or something I could try it would be very helpful... Thanks!

Sun_SSH_1.1, SSH protocols 1.5/2.0, OpenSSL 0x0090700f
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Rhosts Authentication disabled, originating port will not be trusted.
debug1: ssh_connect: needpriv 0
debug1: Connecting to xx.xx.xx.xx [xx.xx.xx.xx] port 22.
debug1: Connection established.
debug1: identity file /export/home/user/.ssh/identity type -1
debug1: identity file /export/home/user/.ssh/id_rsa type -1
debug1: identity file /export/home/user/.ssh/id_dsa type -1
debug1: Remote protocol version 2.0, remote software version Sun_SSH_1.1
debug1: no match: Sun_SSH_1.1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-Sun_SSH_1.1
debug1: Failed to acquire GSS-API credentials for any mechanisms (No credentials were supplied, or the credentials were unavailable or inaccessible
mech_dh: Invalid or unknown error
)
debug1: SSH2_MSG_KEXINIT sent ### Pauses here for exactly 40 seconds... ###
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-cbc hmac-md5 none
debug1: kex: client->server aes128-cbc hmac-md5 none
debug1: Peer sent proposed langtags, ctos: en-US,es,hi-IN,th-TH,en-CA,es-MX,fr,fr-CA,th,i-default
debug1: Peer sent proposed langtags, stoc: en-US,es,hi-IN,th-TH,en-CA,es-MX,fr,fr-CA,th,i-default
debug1: We proposed langtags, ctos: i-default
debug1: We proposed langtags, stoc: i-default
debug1: Negotiated lang: i-default
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug1: Remote: Negotiated main locale: C
debug1: Remote: Negotiated messages locale: C
debug1: dh_gen_key: priv key bits set: 127/256
debug1: bits set: 1624/3191
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
debug1: Host 'xx.xx.xx.xx' is known and matches the RSA host key.
debug1: Found key in /export/home/user/.ssh/known_hosts:7
debug1: bits set: 1614/3191
debug1: ssh_rsa_verify: signature correct
debug1: newkeys: mode 1
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: newkeys: mode 0
debug1: SSH2_MSG_NEWKEYS received
debug1: done: ssh_kex2.
debug1: send SSH2_MSG_SERVICE_REQUEST
debug1: got SSH2_MSG_SERVICE_ACCEPT
debug1: Authentications that can continue: gssapi-keyex,gssapi-with-mic,publickey,password,keyboard-interactive
debug1: Next authentication method: gssapi-keyex
debug1: Next authentication method: gssapi-with-mic
debug1: Failed to acquire GSS-API credentials for any mechanisms (No credentials were supplied, or the credentials were unavailable or inaccessible
mech_dh: Invalid or unknown error
)
debug1: Next authentication method: publickey
debug1: Trying private key: /export/home/user/.ssh/identity
debug1: Trying private key: /export/home/user/.ssh/id_rsa
debug1: Trying private key: /export/home/user/.ssh/id_dsa
debug1: Next authentication method: keyboard-interactive
Password: 

I am guessing, but please show the contents of

/etc/nsswitch.conf
1 Like

Yep, sounds all very familiar.

Yes, it's something that has changed on your network I reckon, either DNS or some routing.

Do you have a network support team at the site? Have you asked them what they've changed? You need to know everything.

Resist greatly any temptation to go in and mess with your Solaris9, you'll probably regret it. If temptation does overcome you, make sure that you have complete backups of everything as it is now so that you can recover easily to the current state.

Post a description of your network. Size? Devices? Routers? DNS servers?

1 Like

I had a similar thought. But felt that nsswitch.conf could give us a clue where to start looking. --externally.

@jim mcnamara....

Yes mate, we're both thinking the same thing. Put it down to experience. I've been caught with all this before. Main problem usually is that people wade in and change stuff, reinstall stuff, and end up with 99 problems when it all started with 1 which wasn't even on the system. I just thought I'd issue the health warning to the OP.

Are you running /usr/bin/ssh on Solaris 9?
Is it a physical box or a zone on a solaris 10 box?
What OS runs the other system (the ssh target)?

@MadeInGermany.....

The OP (code section, first line) says he's running OpenSSL.

Hey guys thanks for the dialog... here is my nsswitch.conf configuration that I have used for years.

Aug 11 2006 /etc/nsswitch.conf

"/etc/nsswitch.conf" [Read only] 42 lines, 1293 characters 
#
# /etc/nsswitch.dns:
#
# An example file that could be copied over to /etc/nsswitch.conf; it uses
# DNS for hosts lookups, otherwise it does not use any other naming service.
#
# "hosts:" and "services:" in this file are used only if the
# /etc/netconfig file has a "-" for nametoaddr_libs of "inet" transports.

passwd:     files
group:      files

# You must also set up the /etc/resolv.conf file for DNS name
# server lookup.  See resolv.conf(4).
hosts:      files
ipnodes:    files
# Uncomment the following line and comment out the above to resolve
# both IPv4 and IPv6 addresses from the ipnodes databases. Note that
# IPv4 addresses are searched in all of the ipnodes databases before
# searching the hosts databases. Before turning this option on, consult
# the Network Administration Guide for more details on using IPv6.
#ipnodes:   files dns

networks:   files
protocols:  files
rpc:        files
ethers:     files
netmasks:   files
bootparams: files
publickey:  files
# At present there isn't a 'files' backend for netgroup;  the system will
#   figure it out pretty quickly, and won't use netgroups at all.
netgroup:   files
automount:  files
aliases:    files
services:   files
sendmailvars:   files
printers:       user files

auth_attr:  files
prof_attr:  files
project:    files

Please describe your network.

Is this a corporate setup?

Where is the DNS server? Local or remote?

@hicksd8 DNS is not used according to nsswitch.conf.
@kingdbad is your ssh client entry present in your server's /etc/hosts file ?

What command are you using to initiate the connection?
(Does it include a nodename or ip address?)

@jlliagre.....with no DNS configured and no /etc/hosts entry surely it wouldn't be just slow to connect, it wouldn't connect at all.

It would certainly as it does.
Have a look to the logs. Even while redacted, is appears the client uses an IP address to connect to the server.

Yes, I have the IPs that are connecting to the server in my /etc/hosts file, but before this weirdness started I did not have the client IPs in my /etc/hosts and it was working. I just added them in there recently to see if it helped at all and it did not. I am also connecting to the server always by it's IP address never by domain name or anything.

I have also checked my nic settings

Server is set to
Interface Speed Duplex
--------- ----- ------
bge 100 Mbit/s full

Router is set to
speed 100
duplex full

Grrrr!

I hope you have a backup. It looks like "them" are the culprits. Not us. Meaning your problem is external. Unless you think those changes are vital try to revert.

Without knowing a lot of stuff I need to know to really help:
Try traceroute to see how you route to a known node that used to work.
If you notice a large delay or *** in your output, those may be nodes that do not support traceroute. Normally nodes in a given domain trust each other.

This suggestion is fraught with assumptions, BTW.

That might be someplace to start a dialog with the windows/network people 'what happened on node xyz?'

Then you might try adding the server on a client's /etc/hosts and use a hostname instead of an IP to see if that makes a difference.

Is connecting from the server to the server exhibiting the same issue ?

ssh 127.0.0.1

Yes, it looks like a delay caused by network function so could be a number of things.

Your Solaris box doesn't use DNS but other network security devices or other nodes might. Some network security devices may be trying to do a reverse DNS lookup to validate the initiator of the connection so you should get your network support to check that entries for the client and server are in there.

Also, take a look in /etc/defaultrouter and see that ip address. Is it still valid? When a inbound connection request comes in Solaris needs to instantly know the route back. I've known network support boys to retire a router and not tell everybody so Solaris then tries to use an incorrect default router. Check with the network team that your configured default router is still alive and kicking. Try pinging it and see if it instantly responds.

Have you tried doing the same connection from a different client? Do you get the same result? What about a different client in a different location?

Can you ping your client from the server? (Of course, not all nodes may be configured to reply to pings but if it does work then it tells you something.)

If I get any more thoughts then I'll post them. Good hunting.

PS. How many NIC's in the box? Can we all assume just the one?

Yes, server to same server is doing the same thing as a remote connection. Also, when I try and connect to 127.0.0.1 it still shows the same symptoms. When I try and connect to a web page or anything on the server it works instantly it only seems to be SSH that I'm having issues with...

@hicks I will also try some of the stuff you are talking about to see if I notice anything.

It's definitely no IP routing problem then!
Does

grep '^[^#]' /etc/net/*/hosts

yield 3 times the local hostname that matches in /etc/hosts,
and does its IP address match a working interface in

ifconfig -a

?

Connection to 127.0.0.1 should be instant. It can be slow if reverse lookup doesn't work. Please confirm

 
127.0.0.1   localhost
 

appears in /etc/hosts.

Also, from the server try connecting using the server's real ip address (rather than 127.0.0.1). Is that slow too?

@kindbag Anything interesting in the system log file ?

dmesg | grep sshd