Cannot kill many processes whose ppid is 1

Good evening please

In a production system SunOS we found the same processes witth ppid = 1, so there must be going on with the application and there are more than 3k processes and increasing, for instance

ps -fu xpinvoice| grep launch_web|wc -l
    3086
bash-3.2$ ps -fea | grep launch_web
xpinvoice 56525     1   0 15:00:29 ?           0:00 sh //produccion/explotacion/xpinvoice/facturacion/ksh/launch_web.ksh -u/ -l3 -eFG
xpinvoice 68873     1   0 12:08:37 ?           0:00 sh //produccion/explotacion/xpinvoice/facturacion/ksh/launch_web.ksh -u/ -l3 -eFG
xpinvoice 15141     1   0 14:36:50 ?           0:00 sh //produccion/explotacion/xpinvoice/facturacion/ksh/launch_web.ksh -u/ -l3 -eFG
xpinvoice  6962     1   0 12:10:16 ?           0:00 sh //produccion/explotacion/xpinvoice/facturacion/ksh/launch_web.ksh -u/ -l3 -eFG
xpinvoice 69284     1   0 16:31:45 ?           0:00 sh //produccion/explotacion/xpinvoice/facturacion/ksh/launch_web.ksh -u/ -l3 -eFG
xpinvoice 37469     1   0 15:25:49 ?           0:00 sh //produccion/explotacion/xpinvoice/facturacion/ksh/launch_web.ksh -u/ -l3 -eFG
xpinvoice 48415     1   0 12:33:02 ?           0:00 sh //produccion/explotacion/xpinvoice/facturacion/ksh/launch_web.ksh -u/ -l3 -eFG
xpinvoice 47245     1   0 12:18:31 ?           0:00 sh //produccion/explotacion/xpinvoice/facturacion/ksh/launch_web.ksh -u/ -l3 -eFG
xpinvoice 62592     1   0 12:07:21 ?           0:00 sh //produccion/explotacion/xpinvoice/facturacion/ksh/launch_web.ksh -u/ -l3 -eFG
xpinvoice 66140     1   0 16:46:16 ?           0:00 sh //produccion/explotacion/xpinvoice/facturacion/ksh/launch_web.ksh -u/ -l3 -eFG
xpinvoice 15841     1   0 15:50:22 ?           0:00 sh //produccion/explotacion/xpinvoice/facturacion/ksh/launch_web.ksh -u/ -l3 -eFG
xpinvoice 11251     1   0 14:21:22 ?           0:00 sh //produccion/explotacion/xpinvoice/facturacion/ksh/launch_web.ksh -u/ -l3 -eFG
xpinvoice 26343     1   0 15:08:44 ?           0:00 sh //produccion/explotacion/xpinvoice/facturacion/ksh/launch_web.ksh -u/ -l3 -eFG

I could not kill those processes, so i escalete up this issue to our sys administrator who says there is no way to kill these processes because they are tied to the kernel, so it is necessary to restart the server, so these question come up:

1 its kind of odd to find the same processes with too many processes with ppid =1 ? so it has something to do with an application failure?

why did our sys admin say these processes are related to the kernel and its necessary to restart the server?

I appreciate your help in advanced?

Have you checked logs from those processes (ksh shell scripts)? One cause is when the free memory list becomes exhausted. This can be the result of a device wait. :ike a disk wait that does not get resolved because the disk was physically removed or went offline - like a cdrom for example. I've seen that on older SunOS machines - Solaris 9

Check the process state (S column) with

ps -flu xpinvoice

Are they in D state?

Good afternoon:

Thank you both of you for your support

Actually i have no evidence since Server was restarted yesterday and in ksh log there is nothing since it updates an oracle table, so when execute this command there is no problem

ps -flu xpfactur
 F S      UID   PID  PPID   C PRI NI     ADDR     SZ    WCHAN    STIME TTY         TIME CMD
 0 S xpfactur 49595 47172   0  39 20        ?   6601        ? 13:01:02 ?           0:00 sqlplus -s / @/produccion/explotaci
 0 O xpfactur 54450 59198   0  40 20        ?    231          13:40:12 pts/49      0:00 ps -flu xpfactur
 0 S xpinvoice 69870 69818   0  40 20        ?   6590        ? 00:22:57 pts/41      0:00 sqlplus /
 0 S xpinvoice 67630 66324   0  40 20        ?    231        ? 01:33:38 pts/42      0:07 -ksh
 0 0 S xpinvoice 56816 56815   0  40 20        ?    202        ? 12:43:04 pts/25      0:00 bc
 0 S xpinvoice 26472 15164   0  52 24        ?  33211        ? 21:36:09 pts/23      3:12 FaSched -a FAC -l 3 -u /
 0 S xpinvoice 15164 15138   0  40 20        ?    228        ? 21:34:40 pts/23      0:00 -ksh
 0 S xpinvoice 53326     1   0  52 24        ?    209        ? 13:40:06 pts/23      0:00 sh //produccion/explotacion/xpfactu
 0 S xpinvoice 14293 14245   0  40 20        ?    228        ? 21:34:27 pts/8       0:00 -ksh
 0 S xpinvoice 65613 65562   0  40 20        ?    228        ? 05:56:13 pts/46      0:00 -ksh
 0 S xpinvoice 25119 24946   0  40 20        ?    229        ? 12:00:03 ?           0:00 /usr/bin/ksh /produccion/explotacio
 0 S xpinvoice 24946  1966   0  40 20        ?   1810        ? 12:00:02 ?           0:00 auto_rem /opt/CA/UnicenterAutoSysJM
 0 S xpinvoice 28370 28359   0  40 20        ?    228        ? 21:45:27 pts/32      0:00 -ksh

What is the next step if status were D ?

Thank you for your help in advanced

D is a device wait. I don't remember old SunOS having one, but OK, no problem, the result is the same - device not available or not online, etc. Both "real" and virtual I/O devices can cause this. If it happens again, the sysadmin is going to have to find the problem device first, in order to be able to consider remediation. A wild guess: network issues since this is a web app.

Are there any NFS mounts? That can be a source of interesting effects as well when abused.

2 Likes

Good Morning, Thank you again

Actually there are 2 NFS resides in another machine where it connects to

12.24.1.18:/INV/planos
                        74G    36G    37G    50%    /INV/planos
12.24.1.18::/INV/logs1
                        36G    30G   6.2G    83%    /INV/logs1