cron jobs not running during certain window

tunachamp · October 5, 2011, 1:00pm

I'm currently experiencing a problem on one of our servers at work where between the hours of approx. 1:00 AM and 2:[2-4]0 AM no jobs scheduled through cron run.

Now, in the cron logs I do see that all jobs are launched and I'm not seeing any errors anywhere - but - none of the jobs do what they're supposed to do and no output is generated in the job specific logs.

Our Linux admin setup a job which simply spits out the date into a file every 5 minutes and as expected, the cron log shows that the job runs, but there is no date in output file for his test job between 1 and 2:40 AM.

1,6,11,16,21,26,31,36,41,46,51,56 * * * * date >>/tmp/data.log

from cron log:

...
Oct  5 01:01:01 host crond[27190]: (user) CMD (date >>/tmp/data.log)
Oct  5 01:06:14 host crond[30807]: (user) CMD (date >>/tmp/data.log)
Oct  5 01:11:02 host crond[31559]: (user) CMD (date >>/tmp/data.log)
Oct  5 01:16:01 host crond[31956]: (user) CMD (date >>/tmp/data.log)
Oct  5 01:21:01 host crond[32390]: (user) CMD (date >>/tmp/data.log)
Oct  5 01:26:01 host crond[324]: (user) CMD (date >>/tmp/data.log)
Oct  5 01:31:01 host crond[718]: (user) CMD (date >>/tmp/data.log)
Oct  5 01:36:01 host crond[1152]: (user) CMD (date >>/tmp/data.log)
Oct  5 01:41:13 host crond[1528]: (user) CMD (date >>/tmp/data.log)
Oct  5 01:46:02 host crond[1902]: (user) CMD (date >>/tmp/data.log)
Oct  5 01:51:15 host crond[2384]: (user) CMD (date >>/tmp/data.log)
Oct  5 01:56:08 host crond[2779]: (user) CMD (date >>/tmp/data.log)
Oct  5 02:01:01 host crond[3161]: (user) CMD (date >>/tmp/data.log)
Oct  5 02:06:03 host crond[3690]: (user) CMD (date >>/tmp/data.log)
Oct  5 02:11:22 host crond[4222]: (user) CMD (date >>/tmp/data.log)
Oct  5 02:16:36 host crond[4695]: (user) CMD (date >>/tmp/data.log)
Oct  5 02:21:02 host crond[5227]: (user) CMD (date >>/tmp/data.log)
Oct  5 02:26:01 host crond[5667]: (user) CMD (date >>/tmp/data.log)
Oct  5 02:31:01 host crond[6069]: (user) CMD (date >>/tmp/data.log)
Oct  5 02:36:01 host crond[6708]: (user) CMD (date >>/tmp/data.log)
Oct  5 02:41:01 host crond[10208]: (user) CMD (date >>/tmp/data.log)
Oct  5 02:46:01 host crond[21524]: (user) CMD (date >>/tmp/data.log)
Oct  5 02:51:01 host crond[23941]: (user) CMD (date >>/tmp/data.log)
...

from /tmp/data.log:

...
Wed Oct  5 00:46:01 EDT 2011
Wed Oct  5 00:51:01 EDT 2011
Wed Oct  5 00:56:01 EDT 2011
Wed Oct  5 01:01:01 EDT 2011
Wed Oct  5 02:40:28 EDT 2011
Wed Oct  5 02:40:40 EDT 2011
Wed Oct  5 02:40:46 EDT 2011
Wed Oct  5 02:40:48 EDT 2011
...

This problem appears to affect all cron jobs.

The server is running RHEL release 4 Nahant Update 8

[user@host/]$ lsb_release -a
LSB Version:    :core-3.0-ia32:core-3.0-noarch:graphics-3.0-ia32:graphics-3.0-noarch
Distributor ID: RedHatEnterpriseAS
Description:    Red Hat Enterprise Linux AS release 4 (Nahant Update 8)
Release:        4
Codename:       NahantUpdate8

Possibly worth mentioning the server hasn't been rebooted in >1100 days.

[user@host/]$ uptime
 12:34:03 up 1100 days, 22:15,  6 users,  load average: 15.23, 12.93, 11.73

Apparently this has been happening for a long time (>1 year?) but was never reported.

I'll be signing in to this box around 1 AM tonight to see if anything peculiar is happening.

Hope that's enough info to get started.

Really appreciate any suggestions.

Tommyk · October 6, 2011, 5:55am

While your logging in at that time see if you can run the job manually too if this works.

And you might want to set 2> to a file to see if any error output is going on.

tunachamp · October 14, 2011, 10:17am

It appears that the system is nearly grinding to a halt during this window which is why the jobs don't run.

Now to find out why...