parsing currently running processes

Hey guys,
I'm writing a monitoring program that reads the pattern and the max and min number of instances of a process and then proceeds to parse the currently running processes for the pattern.

I just want to know how I should go about this. I'll give you an idea of the flow of the program:

  1. daemonise /this is yet to be implemented/
  2. read the process info file to build a list of the process pattern, min number and max number of instances and the rate of scanning for the pattern (in minutes).
  3. loop forever and compare the processes in the list to the currently running processes and throw the appropriate alerts (or whatever)

It is point #3 that has me a bit confused. Should I read the /proc into another list, loop for each process pattern in the original list over the proc list? Or should I not read the /proc into a list at all, and just iterate over the list and read the /proc everytime (considering that /proc is in memory, that shouldn't take too much time either).

Currently the OSes that should work are Solaris and Linux as I have easy access to both, I will probably extend this to HP-UX later.

Parse the process? Do you mean the cmdline that started the process ?

Keep a tab on the pids, i.e. maintain a list (best would hashtable).
If you iterate through /proc always, you might parse some long running processes again and again.

  • For each iteration on the list, you can figure out the process is still running or not. If no ,then remove the entry. If yes, go ahead with your processing. And retain the entry. I dont think you want to raise an alert for the same process again and again.
    For the next iteration against /proc, collect the new pids only. The old ones are the ones you processed already.

I thought Solaris did not have the /proc file system. Hmm..

Yes. I don't mean parse as such, just verify that the processes that have the cmdline patterns specified in the input file are running (with the correct number of instances).

That is a good idea, but even though most of the processes would be long running (such as webservers, app servers and such), I still feel I should check against the command line.

In fact Solaris has a very 'pure' /proc. It only contains information for currently running processes.

This may help you In C Program, determine if job is running

Hi Hitori,
I don't want to know how to read the process details. I am doing that already. My program will have a list of, lets say, command lines patterns, that it has to match against currently running processes. It will do this in an infinite loop, raising alerts if any of the process patterns are not found in the list of running processes or if the number of matches are too few or too many.

It may have to go through the list of running processes every minute (maybe, depending on the configuration). So, my question is whether it would be better to create a list of command line entries from the /proc every minute and go through my list and match (the entire list of current processes would be processes for every entry in my list), or should I go through the all /proc entries for each of the entries in my list.

The first method involves creating a list of an unknown size but issuing the /proc reading commands (readdir, open, read, stat) just once. The second method requires the readdir, open, read commands to be executed for the entire /proc filesystem for each process pattern in my list.

Why can't you use 1st method without creating a list? You can test each /proc entry immediatelly against your list

That becomes the same as the second method. Here's what I'm trying to say in pseudocode:
method 1 - read /proc into a list

while readdir of /proc returns valid entries; do
        read the /proc/<pid>/psinfo file into a struct
        append the struct to a list of such structs
done
for each element in list; do
        for each element in list of structs; do
                compare element against the ps cmd line from struct
                if match, increase count of pattern instances
        done
        if count of pattern instances
                below lower threshold
                        raise alert
                above upper threshold
                        raise alert
                between the correct bounds
                        do nothing
done

method 2 - read /proc for each entry in the list

for each element in list; do
        while readdir of /proc returns valid entries; do
                read the /proc/<pid>/psinfo file into a struct
                compare element against the ps cmd line from struct
                if match, increase count of pattern instances
        done
        if count of pattern instances 
                below lower threshold
                        raise alerts
                above upper threshold
                        raise alerts
                between the correct bounds
                        do nothing
done

2nd method looks better but it must longer access to the /proc that may change during command execution. In any case you can test them both