Solaris+Perl script to get process start date

Hi all,
after reading the post:

  • How to get process start date and time in SOLARIS?
    I wrote my perl script and it worked like a charm.
    This script is called every 5 minutes by the monitoring server crontab and is executed on the remote network elements via ssh (the perl script is located on the remote machines).

The scripts compare the actual unix time:

$ACTUAL_UNIX_TIME=time();

and the the process start date+time:

$PID_PATH="/proc/8895";   #example
@d=localtime((stat($PID_PATH))[9]);
$LOCAL_TIME=sprintf( "Localtime PID %4d/%02d/%02d %02d:%02d:%02d\n", $d[5]+1900,$d[4]+1,$d[3],$d[2],$d[1],$d[0]);

if the difference:

$DELTA=$ACTUAL_UNIX_TIME-$PID_TIME;

is less that 300seconds (5min) then in the output the restart flag will be set to 1.

For some time I had no problems at all but lately I get fake values especially during the night: it happens that for one/two or even three times the $DELTA is =0sec or =1sec, but checking the next day I discover that the process didn't restarted and its $PID_TIME goes back many hours the fake restart!!!

To me it seems that the wrong value is returned by stat/localtime function.

Is it possible that under certain conditions the stat function doesn't work?

Here is our code:

#!/usr/bin/perl
################################################################################
#       subroutine: TR_HANDLER
#       print trace se il flag $DEB = true
################################################################################
sub TR_HANDLER {
        $TRACE = $_[0];

        chomp($TRACE);  
        if ($DEB) {
                printf("$TRACE\n");
        }
        
}

################################################################################
#       main program
#       
################################################################################
use Time::Local;

$argc = @ARGV;
$restart=0;
$proc_status="Service running...";
$PID_SERVICE="";

if ($argc > 3 || $argc < 1){
        print "Wrong number of arguments.\n";
        exit;
}

$CHECK_DELTA=$ARGV[1];

@PROCS=split(/\,/, $ARGV[0]);


if ($ARGV[2]) {
   $DEB=1;
} else {$DEB=0}

foreach $PROC_NAME (@PROCS) {
        $ACTUAL_UNIX_TIME=time();
        $ACTUAL_TIME_STR=sprintf("ACTUAL UNIX TIME = \t".$ACTUAL_UNIX_TIME."\n");
        &TR_HANDLER($ACTUAL_TIME_STR);

        &TR_HANDLER("ps -ef | grep $PROC_NAME | grep -v grep | grep -v $0 | awk '{print \$2}'");
        $PID_SERVICE=`ps -ef | grep "$PROC_NAME" | grep -v grep | grep -v $0 | awk '{print \$2}'`;

        if ($PID_SERVICE == ''){
                &TR_HANDLER("$PROC_NAME down");
                $proc_status="Service down!";
                $restart=1;
                $PID_TIME=$ACTUAL_UNIX_TIME;
        } else {
                &TR_HANDLER("PID_SERVICE=$PID_SERVICE");
                $PID_PATH=sprintf("/proc/%d/status",$PID_SERVICE);
                &TR_HANDLER("PID_PATH=$PID_PATH");

                @d=localtime((stat($PID_PATH))[9]);
                $LOCAL_TIME=sprintf( "Localtime PID %4d/%02d/%02d %02d:%02d:%02d\n", $d[5]+1900,$d[4]+1,$d[3],$d[2],$d[1],$d[0]);
                &TR_HANDLER($LOCAL_TIME);

                $sec=$d[0];
                $min=$d[1];
                $hours=$d[2];
                $day=$d[3];
                $month=$d[4];
                $year=$d[5]+1900;

                $PID_TIME = timelocal($sec,$min,$hours,$day,$month,$year);
                $PID_TIME_STR=sprintf("        PID_TIME = \t".$PID_TIME."\n");
                &TR_HANDLER($PID_TIME_STR);

                $DELTA=$ACTUAL_UNIX_TIME-$PID_TIME;
                $DELTA_STR=sprintf("           DELTA = \t".$DELTA." seconds\n");
                &TR_HANDLER($DELTA_STR);
                if ($DELTA > $CHECK_DELTA) {
                        &TR_HANDLER("No restart detected for $PROC_NAME service!");
                        $restart=0;
                        $proc_status="Service running...";
                } else {
                        &TR_HANDLER("$PROC_NAME service restarted!");
                        $restart=1;
                        $proc_status="Service restarted!";
                }
        }
        print "$PROC_NAME;$PID_TIME;$ACTUAL_UNIX_TIME;$proc_status;$restart\n";
}

The time/stat functions work very well and have been completely tested over and over.

Did this problem occur on the day when you went from standard time to daylight time?

Does your system run xntpd or ntpd?

Hi Jim, thanks for your reply!

This happens often in the evening, in different days not only the day we switched!
Unfortunately I activated the script after the standard/daylight switch so I can't compare the two behaviours (after/before).

We don't have xntpd neither ntpd, our hosts use a proprietary ALU command (more Lucent than Alcatel I guess) to sync. I don't have much infos about this command since it's built into the application itself.
The command is scheduled to run every 10 minutes 24/7/365.

Thanks,
Evan

$PID_SERVICE=`ps -ef | grep "$PROC_NAME" | grep -v grep | grep -v $0 | awk '{print \$2}'`;

I can try most of the commands in the pipe with the exception of "grep -v $0" which I believe is the Perl program itself. So I stuck in 'http' where $PROC_NAME is and got back about 10 lines. There is no loop for multi-line output. Need debug information for how the script reacts to $PID_SERVICE under these conditions.

Attach the debug log, 'paperclip' icon on the message composer.

The only correction I can give definitively is the script runs every 10 minutes 24/7/52. :slight_smile:

>>> "grep -v $0" which I believe is the Perl program itself
Yes that's right.

>>>> There is no loop for multi-line output.
No problem for this because the processes I monitor are single instance (but you're right: it would be good if the script could handle this scenario or at least send an error message, I'll work on this!).

I made another modification in the "query":

$PID_SERVICE=`ps -ef | grep -w "$PROC_NAME" | grep -v grep | grep -v $0 | awk '{print \$2}'`;

This way I grep only the exact name of the process.
I suppose that the problem was due to the processes TCPIPSCH and IPSCH.
Two distinct processes but the same final letters: since I use the -w option everything works fine.
No fake results since 5 days ago, when I made the modification.

BTW: What's 24/7/52? 52 weeks~365, what's the difference? :slight_smile:

Thanks for your time!

Kind Regards,
Evan

Boa noite, estamos com o portal http://www.openyoursource.com, e iniciamos ha praticamente ha 2 semanas e meia, somos um portal de artigos,duvida,dicas,truques e etc, sobre o sistema solaris10 e opensolaris, nao estamos para competir com o linux, e com outros queremos ser um canal de referencia para o opensolaris.

Interesting, never used the -w option, will keep that in mind.

24/7/52 is hours of day, days of week, weeks of year. Most people watch or read the ads for 24/7/365 which is hours of day, days of week, days of year and think this is a valid progression. This would be marked wrong in the sixth grade where I went to school, but that's marketing for you.