kill signal

jhaavinash · December 20, 2004, 4:43am

Hello e'bdy,
We have WebSphere MQ running on AIX 5.1
Every weekend MQ receives a kill -30 signal from some process or user and offloads a big error file. There is no way in MQ through which that process can be tracked.
Is there something which i can do on UNIX level to trap the process?

Best regards,
Avinash Jha
WebSphere Grp

Perderabo · December 20, 2004, 7:44am

What is -30 on AIX?

jhaavinash · December 20, 2004, 8:24am

hi,
here is the complete listing. 30 is USR1

1) HUP 14) ALRM 27) MSG 40) bad trap 53) bad trap
2) INT 15) TERM 28) WINCH 41) bad trap 54) bad trap
3) QUIT 16) URG 29) PWR 42) bad trap 55) bad trap
4) ILL 17) STOP 30) USR1 43) bad trap 56) bad trap
5) TRAP 18) TSTP 31) USR2 44) bad trap 57) bad trap
6) ABRT 19) CONT 32) PROF 45) bad trap 58) bad trap
7) EMT 20) CHLD 33) DANGER 46) bad trap 59) CPUFAIL
8) FPE 21) TTIN 34) VTALRM 47) bad trap 60) GRANT
9) KILL 22) TTOU 35) MIGRATE 48) bad trap 61) RETRACT
10) BUS 23) IO 36) PRE 49) bad trap 62) SOUND
11) SEGV 24) XCPU 37) bad trap 50) bad trap 63) SAK
12) SYS 25) XFSZ 38) bad trap 51) bad trap
13) PIPE 26) bad trap 39) bad trap 52) bad trap

Perderabo · December 20, 2004, 10:10am

Since its USR1 it can't be generated directly by the kernal as the result of a malfunction by the signaled process. Somewhere there must be a signaling process. The default action for USR1 is to just die. If an error file is being generated, your mq process must be catching the signal. So writing a quick wrapper to ignore the signal is out. If the mq process is running as the user "joe", only joe and root can signal it. That limits down the suspects. If you can predict when it happens, run a ps -ef just before. Also look for cron and at jobs belonging to joe and to root.

It is possible that signal comes from the process itself. A process can use raise() or the equivalent to signal itself. This may mean that mq is breaking somehow at that time because it is regularly being asked to somwthing that it can't.