xntpd won't stay up...

dafydd2277 · May 12, 2010, 5:50pm

AIX 5.3-5300.09.06.1013 (AIX 5.3 TL9 SP6)

# startsrc -s xntpd -a "-x"

(with -x at the end of the xntpd line in /etc/rc.tcpip, too.)

will run for 5-15 minutes, and then die.

# errpt -a

with a search on xntpd gives me this:

------------------------------------------------
LABEL:          SRC_SVKO
IDENTIFIER:     BC3BE5A3

Date/Time:       Wed May 12 14:25:13 PDT 2010
Sequence Number: 169
Machine Id:      [redacted]
Node Id:         [redacted]
Class:           S
Type:            PERM
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        MANUALLY RESTART SUBSYSTEM IF NEEDED

Detail Data
SYMPTOM CODE
         256
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'355'
FAILING MODULE
xntpd
-------------------------------------------------

My google-fu is completely failing to help me track down these symptom or error codes.

I'd love suggestions on where to look.

Thanks!
dafydd

shockneck · May 13, 2010, 8:28am

First I would check that I use the same TZ on both the NTP server and the client. Second I would use the logfile and/or tracefile options of xntpd to get information from the subsystem directly as the errpt entry is not very helpful here. Then I would check:

Does the same error happen when using xntpd with step instead of slew? (No -x option)
Does it make a difference if you change the subsystem with chssys and then just issue an starsrc -s xntpd?
Are there any network problems between NTP server and client?

juredd1 · May 13, 2010, 10:48am

Could be the system time to to far off from the real time or the time ntp is pulling from the peer server. Below is an clip from the man page for xntpd. I have had this problem more than once.

Note: When operating in a client mode running AIX 4.2.1 or later, the xntpd daemon will exit with an error 
if no configured servers are within 1000 seconds of local system time. Use the date or ntpdate command
to set the time of a bad skewed system before starting xntpd.

denn · May 13, 2010, 12:36pm

ntpdate can be run before starting the NTP server daemon, if you suspect your system clock might be to far off. This can also be included in a startup script.

dafydd2277 · May 13, 2010, 2:55pm

Thanks for all the hints!

I'm a good Linux guy and former IRIX nerd who's got thrown into an AIX/Oracle install. (I don't mind getting thrown in the deep end. I just wish the pool had water in it...)

The ntp server and both clients are all in the same time zone and on the same subnet. (I'm doing the test environment install as backup to the guy doing the production install.)
The AIX boxes have TZ in /etc/environment as

TZ=PST8PDT-7,M3.2.0/2:00:00,M11.1.0/2:00:00

Where can I find the log files to examine?
A default install of AIX 5.3 TL9 SP6 doesn't seem to install ntpdate.
Shockneck: What chssys changes do you suggest?

Thanks!
dafydd

---------- Post updated at 11:31 AM ---------- Previous update was at 11:06 AM ----------

Found bootlog and conslog. Bootlog has nothing related to xntpd. Conslog only notes that xntpd has received start requests.

Trying this

/usr/sbin/xntpd -l /tmp/xntpd.log -D 10

---------- Post updated at 11:40 AM ---------- Previous update was at 11:31 AM ----------

Scratch the ntpdate thing. For some reason, even with /usr/sbin in the path, AIX will say ntpdate isn't found. Running it got me this:

# /usr/sbin/ntpdate -d <IPADDR>
13 May 11:33:27 ntpdate[462852]: 3.4y
transmit(<IPADDR>)
receive(<IPADDR>)
transmit(<IPADDR>)
receive(<IPADDR>)
transmit(<IPADDR>)
receive(<IPADDR>)
transmit(<IPADDR>)
receive(<IPADDR>)
transmit(<IPADDR>)
server <IPADDR>, port 123
stratum 11, precision -20, leap 00, trust 000
refid [127.127.1.0], delay 0.02574, dispersion 0.00000
transmitted 4, in filter 4
reference time:      cf96c466.644973b0  Fri, May 14 2010  1:33:10.391
originate timestamp: cf96c49f.ac1255cd  Fri, May 14 2010  1:34:07.672
transmit timestamp:  cf95ff97.1a4c3000  Thu, May 13 2010 11:33:27.102
filter delay:  0.02582  0.02574  0.02579  0.02576
               0.00000  0.00000  0.00000  0.00000
filter offset: 50440.56 50440.56 50440.56 50440.56
               0.000000 0.000000 0.000000 0.000000
delay 0.02574, dispersion 0.00000
offset 50440.569358

13 May 11:33:27 ntpdate[462852]: step time server <IPADDR>offset 50440.569358
# date
Thu May 13 11:33:53 PDT 2010

What occurs to me is the way smitty chtz changed the variable in /etc/environment:

TZ=PST8PDT-7,M3.2.0/2:00:00,M11.1.0/2:00:00

I don't see that "-7," or similar for other timezones, in any documentation, anywhere. So, I've changed the var to

TZ=PST8PDT,M3.2.0/2:00:00,M11.1.0/2:00:00

and rebooted (just to be thorough). I'll post what happens...

---------- Post updated at 11:49 AM ---------- Previous update was at 11:40 AM ----------

And, after reboot, the time jumped forward 10 hours. WT*?

Fix that, rerun ntpdate...

This starts to look closer. Still digging...

---------- Post updated at 11:55 AM ---------- Previous update was at 11:49 AM ----------

RTFM. Find out that the ntpdate -d switch doesn't actually change anything. Rerun as "ntpdate -b"

Okay, I've synchronized everyone. Now to see if xntpd will stay up for more than 15 minutes at a stretch...

shockneck · May 13, 2010, 3:10pm

Yes, that would cause xntpd to die. As you did reboot and you did not see any time difference before that my No. 1 suspect is still the TZ which might have been changed in the /etc/environment but needs a reboot to become active.

dafydd2277 · May 13, 2010, 3:17pm

And, all appears good. So,

Reference IBM - Managing the Time Zone Variable for how to create a timezone string.
Set the TZ variable in /etc/environment by hand.
Reboot, or "export TZ=<timezone_string>"
/usr/sbin/ntpdate -b <ntp server>
"startsrc -s xntpd" or
"startsrc -s xntpd -a "-x""

and give yourself a half-hour coffee break. Check and verify xntpd is still running, and you're good to go.

kah00na · May 18, 2010, 2:11pm

To see what error it is actually throwing, add this to your /etc/syslog.conf (all one line):

daemon.debug    /var/log/syslog.daemon.debug    rotate size 5m files 1  # maintain 1 archived file, 5M

Touch the log file because the inetd won't create it:

touch /var/log/syslog.daemon.debug

then restart your inetd process:

stopsrc -s inetd; startsrc -s inetd

Restart your xntpd:

stopsrc -s xntpd; startsrc -s xntpd

Once it dies, look in the /var/log/syslog.daemon file for any errors. Maybe that will show what is going on.