Multiple (thousands) of Cron Instances

Hey all,

I have a box running SUSE SLES 8 and in the past few months the box will randomly spawn thousands of instances of /USR/SBIN/CRON to the point where the box will lock up entirely. Upwards of 14000 instances! I imagine it's using up all of the available files that can be opened at one time and hanging the box. It seems to happen every few months.

It's running Vixie cron, but the odd thing is I have 11 other boxes running the same OS, the same exact build and the same exact crontab and none of them are doing this.

Does anyone any experience with this happening?

Thanks.

-Sys

Just a little bump to see if anyone has any ideas on this. I've yet to find any information on it. I assume many people were on holiday this past week when I posted this. Thanks guys. :slight_smile:

I encounter this on RedHat as well.
I noted some of the system files permission got messed up.
In the end I reinstall the OS.
Fortunately, it is a lab server.

Fortunately for me too it is only a testing system, but the fact that I've had no luck getting to the root cause is a bit frightening. :slight_smile:

have you tried comparing the checksums of the cron binaries across the similar systems?

No I have not, this is an excellent idea. I'll make sure I do this first thing tommorow. Thank you. :slight_smile:

Just one thing, when a cronjob is executed, cron forks itself and execs the job under that process. So if you have thousands of cronjobs scheduled very close together, you may see thousands of cron processes.

The next time you see such behaviour, check the ppid of all the cron processes. If it is the same, they were spawned from the cron daemon.

This checks out BTW. They're a match.

This is a good point, but I really do not have thousands of cron jobs running on this machine. There is a maximum of 6 or so and they all run with a reasonable amount of time in between each other.

if you still haven't rebuilt this box, what is the difference between this box and the other boxes? specifically, is it function (i.e., database server, development server, etc.) or is it software (i.e., network monitoring, development tools, etc.) or both that makes this server different from the base build ... how many users are logging into this box and what do they do? ... if development server, does the error start around the time there's a major project underway like some big compile? ... and what do the system logs say?

I haven't rebuilt yet, but it's slated to be done in the near future.

There is actually no difference at all between this system and the others. They all perform the same function, are from the same build, same crontab files, etc.

Two users log into the box, me and the primary account for the function of the software on the box. Same as the others. As far as I know, this box hasn't been used in a few months and there has been no activity recently.

I haven't been able to find anything in the logs as to something going haywire. It's quite an oddity.