Script to detect time drift

Green_Star · May 10, 2018, 12:11pm

Hello there,

I am not an expert in networking related stuff but I got a requirement to create UNIX script to query our Company's internal time source via NTP for time drift detect and report it when > +/- 50ms.

I have been googling a lot but thought to post it in this forum to get a suggestion on the best way to do this.

Can we read in the ntp.drift file that is created on the server(OS is HP-UX) which script will run and use the value in the file to calculate the time drift or is there a better approach?

Thanks in-advance for you suggestions.

hicksd8 · May 10, 2018, 3:35pm

You have an internal company NTP server? You are trying to test it's accuracy, is that correct? Do you have access to that internal NTP server to run stuff on it? Or can you only check it from a client machine?

Green_Star · May 11, 2018, 8:52am

Thanks for your response hicksd8.

My responses are below

You have an internal company NTP server? Yes, we have internal NTP server

You are trying to test it's accuracy, is that correct? Yes. correct

Do you have access to that internal NTP server to run stuff on it? Or can you only check it from a client machine? We have access to server to run stuff on it.

hicksd8 · May 11, 2018, 10:20am

You probably won't be surprised to learn that I haven't actually done, or tried this, myself, but researching it I think that you can query an external NTP server from your internal NTP server WITHOUT actually updating its clock, and then compare the two for difference.

On your internal NTP server on which you want to monitor drift, set up the NTP config file with a single server entry to an external NTP/PTP server and include the server option 'noselect'.

When the NTP query runs it will not update the clock but the offset can be viewed using

nptq -pn

or

nptq -c rl

You don't say what Unix/Linux you are running (EDIT: Sorry you posted to the HP-UX forum) and therefore I cannot say what NTP commands are implemented on it. You will have to test the function manually first and then get one of the scripting experts on this forum to help you if you need it. You will, of course, also need to include a function to update the clock for real when it gets too far adrift.

rbatte1 · May 14, 2018, 7:32am

You would have to do this on the client side. If you have a problem with your NTP time server, then treat that as the client and get it to align to another trusted clock, be that a radio-clock or internet address.

You can also use ntpdate -d refer-server to get a time difference from the reference server (which must be offering the NTP service, of course)

if the clock drift is too far, then you would need to step the clock on the local server to match something like this:-

Check the offset with - ntpdate -d ref-server
Stop the local NTP service in the normal way
Step the clock into sync - ntpdate ref-server
Start the local service in the normal way
Check the offset with - ntpdate -d ref-server

You should see debug information when you use the -d flag and the last line gives you the agreed offset from the reference server or servers (just a space separated list)

If you can't have the NTP client running all the time because your application doesn't like the clock going backwards even by tiny fractions of a second, you would probably need to schedule and idle minute to step the clock each day. You can run ntpdate -d ref-server at any time and just use the last line to show you the current offset from the trusted clock.

I hope that this helps,
Robin

Green_Star · May 14, 2018, 1:21pm

Thanks rbattle1 for your response.

The user wants us to create a script that runs on the UNIX HP-UX server every 10 min and check for any time drifts >+/- 50 ms. If the time drift is >+/- 50 ms then the script should trigger an email to the user.

I am not quite sure the next steps of the user once they are aware of time drifts.

After talking to couple of DBAs & googling, I understood that if the NTP service runs every 10 minutes on the server, it creates ntp.drift file every time NTP service runs. And ntp.drift contains the offset information and I can use this to calculate time in milli seconds and calculate the time drift. Is this assumption correct?

Please advise. I just want to make sure I am doing the right thing.

Thanks very much in advance.

MadeInGermany · May 14, 2018, 2:30pm

ntpdate requires root rights.
Better is ntpq as hicksd8 wants to propose (but got mispelled).
The following ksh script does it

#!/bin/ksh
PATH=/bin:/usr/bin:/usr/sbin:/sbin
offset=$(
ntpq -c rl | sed -n '
s/.*phase=\([-0-9.]*\).*/\1/p
t
s/.*offset=\([-0-9.]*\).*/\1/p
'
)
[[ $offset -lt 0 ]] && offset=$((-offset))
if [[ $offset -gt 50 ]]
then
  echo "$offset is greater than 50 msec"
fi

NB ksh88 rounds floating point numbers to integers, while ksh93 has full floating point precision.

Green_Star · May 14, 2018, 2:35pm

Thanks a lot for your response MadeInGermany.

I will try this.

Thanks.

MadeInGermany · May 14, 2018, 2:39pm

The following is even simpler, because sed simply strips a leading - and a trailing .xxx .
This would even run in bash or dash.

#!/bin/ksh
PATH=/bin:/usr/bin:/usr/sbin:/sbin
offset=$(
ntpq -c rl | sed -n '
s/.*phase=-\{0,1\}\([0-9]*\).*/\1/p
t
s/.*offset=-\{0,1\}\([0-9]*\).*/\1/p
'
)
if [[ -z $offset ]]
then
  echo "no offset, check NTP service with ntpq -pn"
elif [[ $offset -gt 50 ]]
then
  echo "$offset is greater than 50 msec"
fi

Green_Star · May 16, 2018, 9:53am

Hi MadeInGermany,

Quick question for my knowledge purposes.
If NTP is scheduled to run every 10 minutes on the server, do you think the clocks get synchronised?
Will there be an explicit need to check for the time drifts? Can the time drift situation occur at all?

Thank you.

MadeInGermany · May 16, 2018, 2:31pm

This ntpq is a measurement.
If you trust your NTP setup then you do not need a measurement.

BTW I just have changed my last post, added another check for the case that NTP service does not work (somebody stopped it, all time peers are unreachable, ...).

With HP-UX 11.23 I remember a strangeness: ntpq reported sudden time jumps and the monitoring got occasional alerts.
No such issues on AIX, Solaris, Linux.

driftfile is not relevant. This is the drift between the hardware clock and the NTP servers., that predicts the real drift in case the time peers would not be reachable. Effectively it only causes a faster time sync after a startup of the NTP service. One can run NTP without a driftfile.

Green_Star · May 17, 2018, 9:26am

Perfect!!

Thanks a lot for sharing your knowledge - MadeInGermany.

Green_Star · May 23, 2018, 10:00am

Hello,

One more question.
Once we detect the offset >+/- 50ms, can we correct the time within the script i.e. get the time in sync ?

Thanks!

MadeInGermany · May 23, 2018, 12:39pm

No, this is monitoring. Once monitoring detects an offset you must do a root cause analysis why your sync mechanism has failed.
Your sync mechanism:
ensure there is either ntpd running (needs a correctly configured ntp.conf, check with ntpq -pn) or a cron job with ntpdate.

Green_Star · June 19, 2018, 10:07am

Hi there, sorry to bother you again.

When I had this command "ntpq -c rl" run on the server, this the the output I get.

associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6 Revision 6.0 Tue April 14 14:21:22 UTC 2015",
processor=, system="HP-UX/", leap=00, stratum=3, precision=-20,
rootdelay=14.482, rootdisp=44.199, refid=10.101.162.196,
reftime=ded28e5d.90b912dc  Mon, Jun 18 2018 16:02:37.565,
clock=ded2918b.a1b70695  Mon, Jun 18 2018 16:16:11.631, peer=23187,
tc=10, mintc=3, offset=0.182, frequency=20.824, sys_jitter=0.211,
clk_jitter=0.122, clk_wander=0.000

Do we still need to check for "Phase" in the script?

s/.*phase=-\{0,1\}\([0-9]*\).*/\1/p
t

bakunin · June 19, 2018, 3:37pm

It seems to me that you might profit in understanding from a bit of theory behind NTP and its workings. Here it goes:

Within a network it is vital to keep the timekeeping of the connected systems in sync. For instance the Kerberos protocol will invalidate any authentication attempt coming from a system which time is off by more than a (very narrow) margin from the ticket server. Alas, timekeeping in computers is done basically by some oscillating circuitry which is quite unreliable over periods longer than a few hours at most. This is where NTP comes in to synchronize the system time(s) throughout the network.

NTP is a client-server protocol and works in a hierarchical manner quite like the DNS protocol. At its root there are so-called stratum-1 servers, which generate the correct time using some specially designed hardware (nowadays usually atomic clocks). Atomic clocks are off about a second every other billion of years, so for every practical purpose they are as exact as it gets.

You will never get into contact with a stratum-1 server anyway. These are not publicly available systems but usually secured and talking only to a very few select clients. These clients form the next layer of the hierarchy and are called stratum-2 servers (now, who'd have guessed that?). They are on one hand clients to the stratum-1 servers, so their time is (almost, save for a margin you won't notice) as exact as them and on the other hand act as servers to the interested public.

That still doesn't mean they are publicly available. In fact a company usually has a contract with a company running a stratum-2 server and accesses with one (or two) client system(s) this service. These one or two systems act themselves as so-called stratum-3 servers giving out their time information to every server on the company network.

Now, how is timekeeping with NTP done: basically a client checks with his assigned server from time to time and if there is a difference in the gotten time information and the current system time the system time is adjusted. It could be adjusted by simply setting it but this would create some problems: suppose the system time is 12:00:00 and a file is written. It would get the time stamp of 12:00:00. Now the NTP process sets the time back 5 seconds becaause it got the new information from its NTP server, so the system time would now be 11:59:55. Suppose a process would now try to find the last file one second later and notice that - from its POV - the file would be from the future. No good!

This is why time is usually adjusted "driftingly": the "seconds" on the affected system will be somewhat shortened or lengthened so that the "subjective" time on the system still passes continuously but eventually synchronises with the NTP servers information. This drifting is logged in the "drift file". Notice that per default the drift file resides in /etc and can get pretty big if the timing circuitry of the underlying hardware is crap. I had a few (AIX) systems in my career having fits because of a full root FS after the drift file in /etc filled it up. I'm not sure for HP-UX, but IIRC they don't like a full root FS any better.

You probably can understand your output now a little better.

I hope this helps.

bakunin

MadeInGermany · June 20, 2018, 3:13pm

Looks like the "ntpq -c rl" changes with every update of the NTP package.
Take the "ntpq -pn" output instead! To be processed like this in the shell script:

offset=$(
ntpq -pn | awk '/^
[*]/ {print $9}'
)

Green_Star · June 21, 2018, 11:52am

Thanks a lot Bakunin. That really helps me understand the background!!!

---------- Post updated at 11:52 AM ---------- Previous update was at 09:57 AM ----------

madeingermany:

Looks like the "ntpq -c rl" changes with every update of the NTP package.
Take the "ntpq -pn" output instead! To be processed like this in the shell script:
offset=$(
ntpq -pn | awk '/^
[*]/ {print $9}'
)

Just wondering what would be the value of "offset" using "ntpq -pn" if NTP service is down? Will it be 0 or BLANK?
Command "awk" will print the 9th value of the output "ntpq -pn". So just wondering if NTP service is down, and the offset is BLANK, awk will print "disp" value to the output variable instead of BLANK. Thanks for your time.