Interpretation of Ping behaviour

hi,

working on Solaris 10. need your help on ping behaviour that I encountered.

I ping from source to destination

-bash-3.2# ping -s -t 128  10.10.10.200
PING 10.10.10.200: 56 data bytes   <===== stops here for 2 minutues before getting reply back
64 bytes from 10.10.10.200: icmp_seq=0. time=1.05 ms
64 bytes from 10.10.10.200: icmp_seq=1. time=7.91e+04 ms
.....
64 bytes from 10.10.10.200: icmp_seq=80. time=33.7 ms
64 bytes from 10.10.10.200: icmp_seq=81. time=0.470 ms
64 bytes from 10.10.10.200: icmp_seq=82. time=0.526 ms
64 bytes from 10.10.10.200: icmp_seq=219. time=3.71 ms
64 bytes from 10.10.10.200: icmp_seq=220. time=3.10 ms
64 bytes from 10.10.10.200: icmp_seq=221. time=11.9 ms
64 bytes from 10.10.10.200: icmp_seq=222. time=6.37 ms
64 bytes from 10.10.10.200: icmp_seq=223. time=4.57 ms
64 bytes from 10.10.10.200: icmp_seq=224. time=2.61 ms
64 bytes from 10.10.10.200: icmp_seq=225. time=4.70 ms
64 bytes from 10.10.10.200: icmp_seq=226. time=5.50 ms
64 bytes from 10.10.10.200: icmp_seq=227. time=6.08 ms
64 bytes from 10.10.10.200: icmp_seq=228. time=2.67 ms
....

What could be the cause? I notice also sometimes packet comes back more than 1ms..some reaching 30+ms..

If I use truss to see further:

16825:  xstat(2, "/etc/resolv.conf", 0x080472F8)        = 0
16825:  sysconfig(_CONFIG_OPEN_FILES)                   = 256
16825:  so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, "", SOV_DEFAULT) = 5
16825:  connect(5, 0x0809BF20, 16, SOV_DEFAULT)         = 0
16825:  send(5, " d v01\0\001\0\0\0\0\0\0".., 44, 0)    = 44
16825:      Received signal #14, SIGALRM, in pollsys() [caught]
16825:  pollsys(0x08046CE8, 1, 0x08046CA0, 0x00000000)  Err#4 EINTR
16825:  lwp_sigmask(SIG_SETMASK, 0x00002000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
16825:  sendto(3, "\b\0 gE6 AB9\002 L90 D T".., 64, 32768, 0x0806AA90, 16) = 64
16825:  alarm(1)                                        = 0
16825:  setcontext(0x08046470)
16825:      Received signal #14, SIGALRM, in pollsys() [caught]
16825:  pollsys(0x08046CE8, 1, 0x08046CA0, 0x00000000)  Err#4 EINTR
16825:  lwp_sigmask(SIG_SETMASK, 0x00002000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
16825:  sendto(3, "\b\0D5E4 AB9\003 M90 D T".., 64, 32768, 0x0806AA90, 16) = 64
16825:  alarm(1)                                        = 0
16825:  setcontext(0x08046470)
16825:      Received signal #14, SIGALRM, in pollsys() [caught]
16825:  pollsys(0x08046CE8, 1, 0x08046CA0, 0x00000000)  Err#4 EINTR
16825:  lwp_sigmask(SIG_SETMASK, 0x00002000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
16825:  sendto(3, "\b\0\nE6 AB9\004 N90 D T".., 64, 32768, 0x0806AA90, 16) = 64
16825:  alarm(1)                                        = 0
16825:  setcontext(0x08046470)
16825:      Received signal #14, SIGALRM, in pollsys() [caught]
16825:  pollsys(0x08046CE8, 1, 0x08046CA0, 0x00000000)  Err#4 EINTR
16825:  lwp_sigmask(SIG_SETMASK, 0x00002000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
16825:  sendto(3, "\b\0 ME6 AB9\005 O90 D T".., 64, 32768, 0x0806AA90, 16) = 64
16825:  alarm(1)                                        = 0
16825:  setcontext(0x08046470)
16825:      Received signal #14, SIGALRM, in pollsys() [caught]
16825:  pollsys(0x08046CE8, 1, 0x08046CA0, 0x00000000)  Err#4 EINTR
16825:  lwp_sigmask(SIG_SETMASK, 0x00002000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
16825:  sendto(3, "\b\094E6 AB9\006 P90 D T".., 64, 32768, 0x0806AA90, 16) = 64
16825:  alarm(1)                                        = 0

found that there are many Err#4 EINTR errors.

Can anyone shed some light? thanks

Your network or network connectivity is bad.
Are other destinations affected, too? Then check the cables first.

hi,
i would also like to ask, in my truss output, it tries to read
xstat(2, "/etc/resolv.conf", 0x080472F8) = 0
the /etc/resolv.conf file. I notice in this file, there is a:

search mydomain.com
nameserver 172.x.x.x

I am suspecting is it because its querying a dns server and getting slow reply back? As far as i know, my machine is not configured at nameserver 172.x.x.x

How does /etc/resolv.conf work? Does Solaris use this file by default when querying Dns? or does it use /etc/hosts first?

thanks

The hosts: entry in /etc/nsswich.conf is the host resolution order.
file corresponds to /etc/hosts, and dns corresponds to /etc/resolv.conf
The commands nslookup and hosts bypass /etc/nsswitch.conf and directly use DNS (/etc/resolv.conf).
The command getent hosts ... uses the default lookup i.e. via /etc/nsswitch.conf.

hi, thanks
so if i want to force the OS not to query dns? Can i remove the nameserver in resolv.conf ? or totally remove /etc/resolv.conf ?

You can remove the dns in /etc/nsswitch.conf
Deleting /etc/resolv.conf has no effect.
BTW the host resolving has nothing to do with your ping problem. You pinged IP addresses - nothing to resolve.

/etc/resolv.conf just enumerates the order and names of dns servers to check. You
may need to modify that. We use several dns servers in out network - two infoblox appliances and one windows domain controller.

As Made_in_Germany said, /etc/nsswitch.conf controls where to look in general.

Is your cache name service daemon running? Turn on dns caching.

/fmd> svcs /system/name-service-cache
STATE          STIME    FMRI
online         Oct_17   svc:/system/name-service-cache:default

Should say 'online'

Next check performance of the caching with

nscd -g

You want to see:

CACHE: hosts

         CONFIG:
         enabled: yes
         per user cache: no
         avoid name service: no
         check file: yes
         check file interval: 0
         positive ttl: 3600
         negative ttl: 5
         keep hot count: 20
         hint size: 2048
         max entries: 0 (unlimited)

         STATISTICS:
         positive hits: 39
         negative hits: 2
         positive misses: 2
         negative misses: 3
         total entries: 2
         queries queued: 0
         queries dropped: 0
         cache invalidations: 0
         cache hit rate:       89.1


CACHE: ipnodes

         CONFIG:
         enabled: yes
         per user cache: no
         avoid name service: no
         check file: yes
         check file interval: 0
         positive ttl: 3600
         negative ttl: 5
         keep hot count: 20
         hint size: 2048
         max entries: 0 (unlimited)

         STATISTICS:
         positive hits: 1104
         negative hits: 2
         positive misses: 25
         negative misses: 3
         total entries: 4
         queries queued: 0
         queries dropped: 0
         cache invalidations: 18
         cache hit rate:       97.5

You may need to increase your local dns cache size. Primarily what you need is a sysadmin/network admin who knows this stuff, and is not following a rote playbook for how to maintain a network.

The optimal solution for dns problems like this is most often to set up caching DNS servers, and turn off nscd.

As a side note, it is very slightly possible your cache is becoming stale, possibly a DNS server has problems. If the cache stuff is working you may want to bounce the nscd process. This will clear the caches. So if an immediate rerun of your problem continues, then you have other issues, which IMO tend to be nasty.

N.B.:
This kind of advice is hard to give without actually being there, too many moving parts to do a decent job vicariously like this.

@Made_in_Germany -

Quick proof/disproof of your 'dns is not the problem' is to hard code with ip addresses, bypassing dns. If the problem persists it falls into the TCP traffic routing problems realm.

IMO you have to eliminate flaky DNS first.

This kind of problem is painful in person, almost intractable by remote control.

What are you after? Post #1 does ping an IP address!

thanks all. the problem is solved. turns out its doing a reverse DNS lookup using the nameserver in resolv.conf. have to add an entry on local hosts file to prevent it using the nameserver.