Load avaerage more than 100

Hi, we are using RHEL6 OS and installed Java applications on the OS. For the past few months load was very high and servers are not responding. Validated java processes but could not find anything interesting. After analysis, found a pattern, everyday at midnight the load is increasing and causing issues. Attached the load average from the server.

Can you anyone suggest how to debug this issue more.

many thanks in advance.

Hi,

/var/log/cron should list what long running scheduled tasks are started at 12:35 and 01:15 by the cron daemon.

Maybe there are multiple ones that are slowing each other down and moving them to different timeslots (e.g. 1am vs. 4am) can solve the performance issue.

What you need to do is figure out what is running during the time of interest.
I suggest you something like VOSwatcher (ok.. that was self serving). Or FirstLook if you have Veritas software install (opps there I go again). Basically, you wan to run something that can tell you what was running at the time of problem.

Write a simple shell script that does the following:

  1. set a counter at 1
  2. put the output of the ps command to a file with the counter as the suffix
  3. sleep for 60 seconds
  4. check if the counter has reached 60, reset to 1 if it has
  5. end loop

This will give you a way of finding out what is killing your system.
You should run this at a real time priority so that it cannot be preempted by any normal program.