Why SIGKILL will occur?

choppas · November 15, 2006, 6:14am

Hi Gurus,

I am executing my Datastage jobs on UNIX operating System. While running the jobs i am getting the following error:

main_program: Unexpected termination by Unix signal 9(SIGKILL)

Can any one please let me know what are the possible situations where this SIGKILL will arrise?

Thanks in Advance
Srinivas

sysgate · November 16, 2006, 8:56am

Hi, as you know SIGKILL means something like "force kill the process regardless of the reason" so the program that you are trying to execute is being terminated. Perhaps the reason may be found in error log.

choppas · November 17, 2006, 2:02am

Thank you for your reply.

My Datastage job is not at all giving any error message it is giving one warning like PID 98764 is aborted. Apart from this i am getting nothing useful information from the log regarding the error.

Regards
Choppas

jim_mcnamara · November 17, 2006, 6:47am

"aborted" is not a warning. It is a fatal error. The abort() function in UNIX is how programs commit suicide when they cannot go any further.

I'm guessing - but it sounds a like another process is aborting - why I do not know.
Because the process aborted your program cannot continue.

LivinFree · November 18, 2006, 1:39am

There's also a few other possibilities. One is, of course, an admin is killing off the process in a serious way. Another is a system shutdown - if you trap all other signals, and shutdown time arrives, it will try to kill you "nicely" then resort to kill -9.

And I'm not sure what signal is used, but on Linux hosts, if you completely drive the system out of memory, an old buddy named oom_kill kicks in and starts trying to free up memory by shooting processes in the head. I've seen some weird things die at the hand of oom_kill (like my login shell!)...

bdsffl · November 18, 2006, 2:11am

One suggestion if the the problem occurs soner than later is this.

Start a prstat and redirect to a log file. Thats why I say if
it starts immediately to avoid filling up disk space.

When the sigkill calls out the PID you can then view
the PID by looking at the log file and trace that pid
back to the PPID also by reviewing the log file.

Hopefully this will get you started in the right direction.

Also you can set up user.<severity> in the syslog.conf file
and have that directed to /var/adm/messages

This may also give you a clue to whats happening
with your process. Also look at the fuser command
I haven't used it alot my self but it allows you to tell
who's trying access the particular file *.db or whatever
file your interested in.

But as mentioned earlier since databases tend to be on the large size, physical memory and swap space maybe running out.

prstat > problem.db.log

I would run this in the foreground so you don't forget
it's running. It wiil take up some disk space but the log can rm'd as required.

choppas · November 22, 2006, 4:07am

Thanks alot Gurus,

It is the problem with Memory allotted to the User by using which we are running Datastage Jobs. Immediately reaching the threshhold value all the jobs are getting failed inorder to free up the space. If we run the jobs sequentially, jobs are getting succeeded. Anyway the memory allotted was increased. And my problem got resolved. Once again thanks to all of you.

Regards
Srinivas

bdsffl · November 22, 2006, 6:52am

For future reference can you tell me how you increased
the memory usage for the individual user.

I am aware of disk quotas' but I'm not aware of
physical or virtual memory allocation for individual
users.

Thanks

LivinFree · November 27, 2006, 12:56am

bdsffl,

I'm not sure what flavor-specific options exist, but 'ulimit' is what I've used in the past. 'man ulimit' for more.

In the OP's wording, I'm also interested in what he adjusted, as some software (such as Oracle,) needs kernel tuning to get right.

bdsffl · November 27, 2006, 1:40am

Thanks,

Macosta

Hopefully he'll get back with us.