How to check same process running 10 times?

At run time Without knowing job name how to check the job running in specific user "ABCD" ,If the running job duplicate more then 10 then script it self send alert message to the users with the process ID name so that will kill the processed to avoid hung issue ,tried below script please check and let me know if any comments

1)Not able to add line to check job running duplicate more then 10 times

 
ps -fu ABCD | awk '{print $1 $2,$8}' | \
     while read pid ppid
     do
           
           then                
             mail -s "Please kill the process " "kalia@gmail.com"
           fi
             
     done
 

We cant reply if you spend your time modifying your code, our posts risk beeing off topic you already modifies 3 times - I started to reply to the previous and found the code changed again making my reply completely ununderstandable

Here is an approach using pgrep and awk that will check duplicate instances by process name:-

pgrep -l -U ABCD | awk '
        {
                ++p_counter[$2]
        }
        END {
                for ( k in p_counter )
                {
                        if ( p_counter[k] > 10 )
                        {
                                cmd = "echo \"Found " p_counter[k] " instances of " k " running\" | mailx -s \"Please kill process\" user@gmail.com"
                                system(cmd)
                        }
                }
        }
'

Assumption: You are using bash on Linux. Try something like:

while read count name
do
   if (( count >= 10 ))
   then
   # email command goes here
   fi
done <(ps -u ABCD -ocomm= | sort | uniq -c)

Untested.

Basically don't use ps -f | awk 'print $x' when you can use ps -o xxx
Sort puts the commands into order; uniq -c counts them.

Andrew

1 Like

Note that ps -o xxx will not work on all systems, E.g. HP-UX

You have to set UNIX95 flag to obey the XPG4 - XOpen Portibilty Guide V4

UNIX95=1 ps -u ABCD-ocomm=
1 Like

Thanks for urs replay

Am attaching sample input file so that you can clearly understand

Output send below alert message

"The job "d.ksh" runing more then 10 time please check the process avoid hung issue"
"The job "a.ksh" runing more then 10 time please check the process avoid hung issue"
if we can addd job name with processid in the script that will be also fine

below is input file

abcd       351 0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/ksh/d.ksh
abcd       352 0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/ksh/d.ksh
abcd       3533  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/ksh/d.ksh
abcd       3554  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/ksh/d.ksh
abcd       3655  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/ksh/d.ksh
abcd       356  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/ksh/d.ksh
abcd       357  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/ksh/d.ksh
abcd       36  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/ksh/d.ksh
abcd       359  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/ksh/d.ksh
abcd       3599  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/ksh/d.ksh
abcd       3499  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/ksh/d.ksh
abcd     24776  0.0  0.0  836    0 q0 IW   May  2  0:41 /usr/local/bin/pine3.96
abcd       1485  0.0  1.8  624  540 q2 S    18:33   0:07 /usr/local/bin/pine3.96
abcd       2280  0.0  0.0  692    0 p7 IW   18:51   0:05 /usr/local/bin/pine3.96
abcd      2737  0.0  0.7   32  204 q4 S    19:00   0:00 grep pine
abcd      1690  0.0  5.3  696 1616 p8 S    18:39   0:09 /usr/local/bin/pine3.96
abcd       1  0.0  848    0 q9 IW   May  2  0:46 /usr/local/kali/ksh/a.ksh
abcd       2 0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/kali/ksh/a.ksh
abcd       3  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/kali/ksh/a.ksh
abcd       4  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/kali/ksh/a.ksh
abcd       5  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/kali/ksh/a.ksh
abcd       6  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/kali/ksh/a.ksh
abcd       7  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/kali/ksh/a.ksh
abcd       8  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/kali/ksh/a.ksh
abcd       9  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/kali/ksh/a.ksh
abcd       91  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/kali/ksh/a.ksh
abcd       95  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/kali/ksh/a.ksh
abcd       3599  0.0  0.0  848    0 q9 IW   May  2  0:46 /usr/local/ksh/d.ksh
abcd     24776  0.0  0.0  836    0 q0 IW   May  2  0:41 /usr/local/bin/pine3.96
abcd       1485  0.0  1.8  624  540 q2 S    18:33   0:07 /usr/local/bin/pine3.96
abcd       2280  0.0  0.0  692    0 p7 IW   18:51   0:05 /usr/local/bin/pine3.96
abcd      2737  0.0  0.7   32  204 q4 S    19:00   0:00 grep pine
abcd      1690  0.0  5.3  696 1616 p8 S    18:39   0:09 /usr/local/bin/pine3.96

Please use code tags for posting code fragments or data samples.

Replace comm with args in apmcd47 solution:-

UNIX95=1 ps -u ABCD -oargs=

AIX-Am new in ksh script While runing both the script getting below errors

syntax error near unexpected token `-u'

awk: cmd. line:10:                                 cmd = "echo \"Found " p_counter[k] " instances of " k " running\"
awk: cmd. line:10:                                                                                       ^ unterminated string

---------- Post updated at 12:38 AM ---------- Previous update was at 12:07 AM ----------

tried below script want to add one more step to add duplicated job name at run time

cnt=$(ps -u abcd | )
if [[ $cnt -gt 1 ]]; then
  echo "job runing longtime"
  exit 1
fi;

Do you see it now?

cmd = "echo \"Found " p_counter[k] " instances of " k " running\" 
                                                                                             ^ unterminated string

The following reports all processes from user abcd with 10 or more instances

ps -u abcd -o comm= | awk '++s[$1]==10'

The following also filters for process name d.ksh or a.ksh

ps -u abcd -o comm= | awk '($1=="a.ksh" || $1=="d.ksh") && ++s[$1]==10'

Thanks for replay

yes it is working fine please let me know below syntax

2..Also if you can add to send alert massage with running job name more then 10 time with their process ID,Will be fine

 -o comm=
 

You can test the output for being empty.
For example you can store it in a varable, and test that for being empty (-z Here: -n not empty).

jobs=$(ps -u abcd -o comm= | awk '($1=="a.ksh" || $1=="d.ksh") && ++s[$1]==10')
if [[ -n "$jobs" ]]
then
  printf "jobs runing longtime:\n%s\n" "$jobs"
  exit 1
fi

Basically the -o switch to ps allows you to select your own fields rather than use the defaults. This is particularly important in writing scripts that may be transferred to another platform, because the default selection of fields in ps -ef will be different if you move from Linux to Unix or MacOS X but ps -o will, with Yoda's caveats, most likely work. And if they don't, you will get an error rather than some odd misbehaviour that is hard to trace.

Multiple fields can be specified using a comma to separate each from the next. For instance, the default fields given by

ps

on my Ubuntu system are equivalent to

ps -opid,tty,time,comm

but if I wanted the parent-pid and effective user-id instead of the tty and the time I could type

ps -opid,ppid,euid,comm

And by changing the order of the field names you change their order in the output.

Appending the equals sign (=) to a field name suppresses its heading in the final output. Note, though, that you have to suppress the heading of every field, for example:

ps -opid=,ppid=,args=  # right
ps -opid,ppid,args=    # wrong, will only suppress the COMMAND heading

I hope that makes sense.

Getting the PIDs of the processes in the alert could be more tricky. If you use my original suggestion and have pgrep on your system you could use pgrep within the loop to inject the PIDs into your message.

Andrew

1 Like

Thanks to all ,MadeInGermany

for below script below script matching my requirement.
doubt- is it possible to print job with process ID if not please ignore

  
 jobs=$(ps -u abcd -o comm= | awk '($1=="a.ksh" || $1=="d.ksh") && ++s[$1]==10')
if [[ -n "$jobs" ]]
then
  printf "jobs runing longtime:\n%s\n" "$jobs"
  exit 1
fi
  
 

---------- Post updated at 11:41 PM ---------- Previous update was at 07:42 PM ----------

HI I have almost near to my requirement
getting small issue need some correction while runing below script not fetching current runing full jobs name more then 10times

ps -u abcd -o comm= | awk '++s[$1]==10'

for example: runing more then time 10 time 3 different paths

/usr/local/ksh/d.ksh 
/usr/ka/ksh/ab.java 
/usr/na/ksh/abc.perl

For my server runing more then 100 jobs on different location/path using AIX operating system
While runing its fetching only ksh ,perl.java,c,pli like that not fetching full name of the jobs

Sure, you can add another -o pid= column, and you can change -o comm= to -o args=
Rule: put the predictable columns first, and have the args last!

ps -u abcd -o pid= -o args=

The trailing = means "don't print a header for this column", and all columns = means "don't print a header line at all".
apmcd47 mentioned a shorter notation but I found it does not work everywhere.
Does it look more promising now?
Postprocessing of args is problematic because it contains spaces. In awk you cannot simply refer to column $2 because your process name might be in $3 ...

Thanks a lot for ur quick respond Am runing below script passing 1 because no duplicate job runing now it giving correct output-AIX operating system

my question::Will it work if runing more then 10 time or will it compaire please confirm

awk '++s[$2]==10'
jobs=$(ps -u pintu -o pid= -o args=| awk '++s[$2]==1')
if [[ -n "$jobs" ]]
then
  printf "jobs runing longtime:\n%s\n" "$jobs"
  exit 1
fi

see the below output also

jobs runing longtime:
 2362 /usr/bin/gnome-keyring-daemon --daemonize --login
 2371 gnome-session
 2380 dbus-launch --sh-syntax --exit-with-session
 2381 /bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session
 2399 /usr/libexec/gconfd-2
 2406 /usr/libexec/gnome-settings-daemon
 2408 seahorse-daemon
 2415 /usr/libexec/gvfsd
 2423 metacity
 2434 /usr/bin/pulseaudio --start --log-target=syslog
 2435 gnome-panel
 2440 /usr/libexec/pulse/gconf-helper
 2441 nautilus
 2443 /usr/libexec/bonobo-activation-server --ac-activate --ior-output-fd=18
 2453 /usr/libexec/gvfs-gdu-volume-monitor
 2454 /usr/libexec/wnck-applet --oaf-activate-iid=OAFIID:GNOME_Wncklet_Factory --oaf-ior-fd=18
 2455 /usr/libexec/trashapplet --oaf-activate-iid=OAFIID:GNOME_Panel_TrashApplet_Factory --oaf-ior-fd=24
 2459 /usr/lib/vmware-tools/sbin32/vmtoolsd -n vmusr --blockFd 3
 2467 gpk-update-icon
 2470 gnome-power-manager
 2474 gnome-volume-control-applet
 2476 bluetooth-applet
 2480 /usr/sbin/restorecond -u
 2482 /usr/libexec/polkit-gnome-authentication-agent-1
 2485 python /usr/share/system-config-printer/applet.py
 2486 /usr/libexec/gdu-notification-daemon
 2489 nm-applet --sm-disable
 2494 /usr/libexec/gvfs-afc-volume-monitor
 2498 /usr/libexec/gvfsd-trash --spawner :1.7 /org/gtk/gvfs/exec_spaw/0
 2502 /usr/libexec/gvfs-gphoto2-volume-monitor
 2509 gnome-screensaver
 2518 /usr/bin/gnote --panel-applet --oaf-activate-iid=OAFIID:GnoteApplet_Factory --oaf-ior-fd=23
 2519 /usr/libexec/clock-applet --oaf-activate-iid=OAFIID:GNOME_ClockApplet_Factory --oaf-ior-fd=29
 2520 /usr/libexec/notification-area-applet --oaf-activate-iid=OAFIID:GNOME_NotificationAreaApplet_Factory --oaf-ior-fd=39
 2521 /usr/libexec/gdm-user-switch-applet --oaf-activate-iid=OAFIID:GNOME_FastUserSwitchApplet_Factory --oaf-ior-fd=35
 2562 /usr/libexec/gvfsd-burn --spawner :1.7 /org/gtk/gvfs/exec_spaw/1
 2569 /usr/libexec/gvfsd-metadata
 2575 /usr/bin/gnome-terminal -x /bin/sh -c cd '/home/pintu/Desktop' && exec $SHELL -l
 2576 gnome-pty-helper
 2577 /bin/bash -l
 2843 sh t2.sh
 2845 ps -u pintu -o pid= -o args=
 2846 awk ++s[$2]==1

The

awk '++s[$2]==10'

lists the 10th occurrence of column#2, and if there are more than 10 it will not wipe a previously printed one :wink:
You can search for certain strings through the whole line like this

awk '(index($0, "a.ksh") || index($0, "d.ksh") && ++s[$2]==10'

and this should work with the args .
Regarding "jobs running longtime", perhaps etime (elapsed time) is more interesting:

ps -e -o pid= -o etime= -o args=

And now your task is to have the postprocessor (awk) compute a comparible number from the etime column.