mkitab problem with /etc/inittab respawning

jeffpas · July 21, 2008, 11:51am

Hi All,

May be a dumb question to old AIX hacks, if so apologize.
I have worked with /etc/inittab on SCO, but apparently with AIX you should use the 'mkitab' command to add entries instead of just vi'ing the file.

I just need a daemon process (script called 'dpr_daemon') to kick off once and restart if it is ever killed.
After checking through documentation, the logical step appears to be (level 2, normal) add to inittab w/this line:

mkitab "dprdaemon:2:respawn:/dplogs/dpr_daemon"

However, when I do this, I immediately find 10-25 dpr_daemons running on the box. The shock and awe result. Live box, can't play around too much with this.

Is there something I am missing? Surely 'respawn' is as documented in man pages, respawn the process only if killed. 'once' would not restart the process if killed.

Thanks much for any advice---
jpass

zaxxon · July 22, 2008, 5:19am

The syntax is correct, but it seems init thinks that process is already dead and spawns a new one. If I find anything about it, I will post it.

jeffpas · July 22, 2008, 2:53pm

Someone mentioned that perhaps since this program runs in the background, that inittab thinks the program has ended and keeps respawning again and again.

But I don't know how to write a daemon that runs in the foreground. Is there such a thing? Then it would become a program, and no longer be a daemon anymore.
Not sure, dont' completely understand.

johnf · July 23, 2008, 2:35am

I think the thing you need to ask is does the script run continuously or does it naturally finish (die). If it does in fact finish then it will respawn. What does the script do?

shockneck · July 26, 2008, 8:18am

I don't know SCO. As an AIX admin I'd probably create a subsystem and put it under control of the AIX System Resource Controler SRC. Start with the mkssys man page if that was an option.
However, if you want to stick to your setup - what if your script would check itself whether it is already running so that it would refrain from starting again while it is active?

jeffpas · September 22, 2008, 4:14pm

I would have to assume then that the command for doing this (in my instance) would be:

mkssys -p /dplogs/dpr_daemon -s dprdaemon -u 0 -a "-D" -e /dev/null -i /dev/null -o /dev/null -R -S -f 9 -n 15

you would then:

startsrc -s dprdaemon
stopsrc -s dprdaemon

To start and stop the process, and it would automatically restart if killed.

I have seen examples with a "-G tcpip" tacked on the end, but since this new little daemon is I assume not related to any group, I would think that would not be included.

I don't completely understand the references to /dev/null here but does anyone see any holes here? I assume this would make it permanent? I really don't have the option of shutting down/rebooting the box.
Just looking for help, thanks!

jeffpas

Perderabo · September 22, 2008, 5:52pm

When a daemon (such as init) runs a subprocess, it is also a daemon. No need to do anything else.

Let's say that init spawns process 1234 for your program. Now init expects pid 1234 to stay around. If it exits, init will respawn it. This is why your program must not try to redaemonize itself. That involves spawning a child and then exiting. init will not recognize the child of pid 1234 as a replacement. Pid 1234 died, so init thinks it needs another copy.

jeffpas · September 22, 2008, 6:01pm

Pederabo
Just clarifying-

So you are agreeing with Shockneck and saying that using the SRC would handle this problem better than an /etc/inittab entry, which is what I did at first?

You agree with the solution I gave below using mkssys as as a safe alternative?

I already had runaway processes blow up on this box to the danger of livelihood and limb, would like to hear a few ayes from the crew on this new different approach before proceeding.

thanx!

Perderabo · September 22, 2008, 6:23pm

I don't know AIX, SRC, or mkssys well enough to comment on that.

jeffpas · September 23, 2008, 2:22pm

Does anyone else agree with my mkssys solution?

shockneck · September 23, 2008, 2:50pm

I don't know the app but the mkssys definition looks good. Give it a try.

jeffpas · September 25, 2008, 11:48am

Okay I created the dprdaemon subsystem with this command:

mkssys -p /dplogs/dpr_daemon -s dprdaemon -u 0 -a "-D" -e /dev/null -i /dev/null -o /dev/null -R -S -Q -f 9 -n 15

I specifically put in '-Q' to state that multiple instances of this program are not allowed, although it is the default anyway.

It responded that it had created the subsystem.

When I do a

lssrc -a | grep dpr

I see it as 'inoperative'.
dprdaemon inoperative

When I:

startsrc -s dprdaemon

I get:
0513-059 The dprdaemon Subsystem has been started. Subsystem PID is 831714.

All good.

BUT........
I immediately have 3 instances/processes running on the box, not 1:

# ps -ef | grep dpr
root 454674 1 0 15:42:04 - 0:00 /usr/bin/perl /dplogs/dpr/dpr_daemon -D
root 831720 1 0 15:42:04 - 0:00 /usr/bin/perl /dplogs/dpr/dpr_daemon -D
root 839896 1 0 15:42:04 - 0:00 /usr/bin/perl /dplogs/dpr/dpr_daemon -D

Also, when I do:

# lssrc -a | grep dpr

I see:

dprdaemon inoperative

Very strange!
Registering the subsystem seems to have worked. But why would activating the subsystem create 3 instances, and then the subsystem show as 'inoperative' ?

Puzzling.

shockneck · September 25, 2008, 4:05pm

What does
# odmget -q subsysname=dprdaemon SRCsubsys
return?

jeffpas · September 25, 2008, 5:12pm

shockneck:

SRCsubsys:
subsysname = "dprdaemon"
synonym = ""
cmdargs = ""
path = "/dplogs/dpr/dpr_daemon"
uid = 0
auditid = 0
standin = "/dev/null"
standout = "/dev/null"
standerr = "/dev/null"
action = 1
multi = 0
contact = 2
svrkey = 0
svrmtype = 0
priority = 20
signorm = 15
sigforce = 9
display = 1
waittime = 20
grpname = ""

jeffpas · September 25, 2008, 5:16pm

I wonder if this has something to do with the fact that the dprdaemon program runs in the background (using 'fork').

I have found this page:

FGA: Mistakes to avoid when designing Unix dmon programs

Which seems to indicate that the 'fork' should be taken out of the daemon script.

Here is part of the daemon script, which I got off the 'Net:

## Define functions
sub daemonize {
chdir '/' or die "Can't chdir to /: $!";
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>>/dev/null' or die "Can't write to /dev/null: $!";
defined (my $pid = fork) or die "Can't fork: $!";
exit if $pid;
setsid or die "Can't start a new session: $!";
umask 0;
}

## The following executable called by dpr_daemon
sub monitor_logfiles {
`/dplogs/dpr/dpr_monitor`;
}

## initialize Perl
$[ = 1; # set array base to 1
$| = 1; # flush the buffer

## prepare dpr_daemon
daemonize();

## summon dpr_daemon
while(1) {
monitor_logfiles;

    \#wait for 20 seconds
    sleep\(20\);

}

The article suggests that yes this would work if run off the command line, but I think it is saying that if it is called by SRC, then it will automatically run in the background and therefore fork isn't necessary.

Hmmmm

Perderabo · September 25, 2008, 5:37pm

That is a good link. Please reread this sentence: "The concepts of "foreground" and "background" don't apply to d�mons." a few times. There is no such thing as a daemon running in the "background". Foreground and background only apply to programs with a controlling terminal. By definition, a daemon has no controlling terminal. Yes you seem to have a daemon that is superfluously re-daemonizing itself, but if done correctly this is a harmless waste of time. I doubt that it explains 3 instances of the daemon running. Still, why not remove that code and see what happens? A program launched by "cron" or "at" will be a daemon and this might be an easier way to test it.

jeffpas · September 25, 2008, 5:45pm

Yes all that is well and good and I agree. I was the one who proposed cron in the first place. But this company wants something that runs 'continuously'.
I won a compromise by creating a daemon that issues every 20 seconds (once a minute was not good enough).

As far as taking the fork out of the beast, I am tempted to remove this line and simply re-run everything:

defined (my $pid = fork) or die "Can't fork: $!";
exit if $pid;

The article doesn't seem to say exactly what to take out.
However my instincts for job self-preservation have dictated that I should chew my nails and build up some courage for awhile, check the web and perhaps wait for a forum reply before attempting it. Especially since I have nothing but a root login to use and am not permitted to create a regular account without clearance.

Nothing would please this crew more than to string me up by the neck for inadvertently creating runaway processes or some mistake that caused a bottleneck on the box. Not much room for trial and error.
Did I mention I love my job?

I do!

jeffpas · September 25, 2008, 6:09pm

ALLLLRIGHT..........

I got bold and commented out these lines in the daemon script:

    \#\#defined \(my $pid = fork\)      or die "Can't fork: $!";
    \#\#exit if $pid;
    \#\#setsid                                or die "Can't start a new session: $!";

Then I recreated the SRC trigger, started the resource and there is now only one instance of the daemon running.
The SRC "dprdaemon" also now shows as 'active'.

When I kill the dprdaemon process, it automatically restarts.

And, the program appears to be working also.

I think we have a solution here.............
knock on wood

shockneck · September 25, 2008, 6:09pm

The SRC seems to get the impression that your subsystem terminates abnormally. As you defined it to respawn in such a case it will respawn twice within the waittime timelimit. Hence the three processes. So handling of your program/daemon by SRC seems to work as designed. Probably the answer lies in the perl code. Can you activate some debuging mode in your code to trace what happens during start?

shockneck · September 25, 2008, 6:12pm

Congratulations