error : can not fork new process

hi
today we came across error "can not fork new process" when i checked there were 400 ksh processes were running for that particular user ( due to kernel parameter setting no of processes were restricted to 400 ) and the reason for this was somebody executed shell script which had "*" ( only * in line nothing else ) in some lines. ( and sadly there were some more scripts present with * in between )

to kill i logged in as root then ps -fu <user> | awk '{print $2}' | xargs kill -9
and even after that there were 400 processes running ( with new PID ) now next was to kill all users on mount point for that i used fuser -kuc <mp> that didnt help either i used fuser at least 10-15 times still it was of no use. then i called our sysadmin ( i am not sys admin ) who told me to rename ksh. i did that but still no help. i ended up rebooting server.

i have some questions ...

1) even after renaming ksh , why there were so many ksh processes running ? there is no ksh on server after renaming then how ?
2) what are other ways of stopping this ? its not possible to reboot server every time.

The * tried to execute everything in the current directory probably /home/careless_user.

chmod -R 000 /home/careless_user will stop new process creation.
chmod 000 /usr/bin/ksh or renaming it will just cause a lot of other scripts to bomb

Since the * tries to execute every file in the directory, the processes that were halfway done with trying to run stuff will be short-circuited. No new processes can start any children. A few seconds after you chmod 000 everything, you are in a position to try to kill the remaining processes, since they cannot create children.

kill -9 $(ps -e | |grep 'careless_user$' | awk '{print $1 }')

400 pids will not exceed ARG_MAX, so this should work without error.

Things are also interesting when a user creates a file named * - rsync and backup scripts just love it.

Hi Zedex,

A fork can fail due to many reasons. It technically depends on the error message you find on your syslog. If you could give us the exact error message, it would be useful. It may be because of nprocs or maxuprc, swap memory kinda stuff.

A easy fact of this message is that a program is spawning too many processes or even by calling itself.

I believe that your issue should be with maxuprc. Give us more information. Also, what is the OS release you are using?

-DB

Does he ( your sysadmin) remember most users on HPUX have /usr/bin/ksh as default shell (in /etc/passwd...)?
What if root can login only at the console, and no one were connected? Who will su root?...

When you say 400 proc max, what value are you talking about? nproc or maxuprc?
The difference? One is global - for all the system, the other is per user...

I, would start by removing . in the path of that user, if not enough, change his login shell to rksh and perhaps use script to see what he is up to, this reminds me - Have you looked in his .sh_history to see what he was doing ? This may give you some clues...

And you have mentioned you had scripts that may be the cause , I would go and chase them and I would remove execute permission on them, then try to understand their content...