Runaway process. Opinions needed

not too long ago, i wrote a very short script that will bring up 4 customized xterms. The script went completely abnormal simply because of an error I had made in a while loop. This script took control of the system and rendered everything useless. The system admin team which i was part of tried every command we could think of to kill this damn script but nothing worked. We tried fuser -k, pkill etc. Fuser -k could have worked but this script had spawn way too many pids for fuser to handle. As fuser was killing them, hundreds more were being born. the output from the uptime command was mind blowing (over 240)

now, from reading countless numbers of books, i knew there was a way I could have killed the person who started this script, which was me but I couldn't remember what command it was. To make a long story short, we had to reboot the server inorder for it to work properly

My question is, could there have been any other way we could have killed this process besides having to reboot it? I hate to fall in this same status quo again.

The first thing to do if you could not find the correct pid to kill was to remove the option to start another process.

If the process was running a certain program or script, change the permissions on it so it can't be executed. Then you at least won't have more and more processes being created while you are trying to find the one process you need to kill.

You simple have to keep a level head - ask yourself
How were all these processes created? How can I stop more processes? How do I find the single process that will stop all of this?

You don't give enough info on your situation - I would think you could have found anyone of the processes and seen the parent pid to kill off. Sometimes that is difficult with the system getting slower and slower because of new processes being created.

Another quick way that may or may not work in your situation is to touch /etc/nologin.

Give more details.

RTM, this sounds like a "while(1) fork();" situation. Every single process is trying to make more copies of itself.

You can su to the user in question. Then you can do: "kill -9 -1" which will kill every process owned by that user. That is the trick that I think you are looking for.

But: "kill -STOP -1" is more subtle. All of the user's processes will suddenly stop running. Then you can "kill -9" the bad guys. After the bad guys are gone, you can "kill -CONT" any good processes that you don't want to lose.

If your shell does not have kill as a built-in, there is a problem... you probably can't start another process to run the kill command. If that happens, use: "exec /usr/bin/kill -9 -1" or whatever.

To tell you guys the truth, i panicked when this happened to me because i have never been a deadly situation like this before. I use to be that cautious admin that did only what he had to until i got lazier and decided to write more and more and more scripts.

anyway, the system was SunOS 5.8. and since i loved the bash shell and this particular script was being run in the bash shell, i issued the command "fuser -k /bin/bash" which seemed to work but the consistent rebirth of this never-dieing processes rendered this command ineffective

mind you, if it wasn't for the fact that the system was getting slower and slower, i probably would have done something smarter but i had to think quicker than the rest of the admins in my team. am the one that messed up here.

i think changing the ownership of the script that started this would have been the best way but "kill -9 -1" wouldn't have hurt to try out.

by the way, how do i use that command?

is it kill -9 -1 (the username) or what?

It is just:
kill -9 -1
nothing after the -1 except carriage return.

And make sure that you su to the user first! If you run that kill command as root, you will kill everything.