[BASH] Script to manage background scripts (running, finished, exit code)

Heyas,

Since this question (similar) occur every now and then, and given the fact i was thinking about it just recently (1-2 weeks) anyway, i started to write something :stuck_out_tongue:

The last point for motivation was How to run scripts parallely inside shell script? - Page 2
Where I've posted a screen-shot from my very first (and small) approach.

Now i want to write a script that will work for/with:

  • 'any' amount of scripts passed,
  • provides a limiter, so only X scripts are running simultaneously
  • Prints the status code of each of the scripts ran

So I had it running and working, but then figured, I've only tested 4 scripts (sleep [5-30secs), but the LIMIT was 5, so I reduced the limit to 2, now i cant figure why its not working.
Actually, it wont even run (properly) with the default LIMIT of 5 even when only 4 scripts are passed :frowning:
And the previous code no longer works neither, since I've renamed some of the variables, for a better 'sort'.

My approach is:

  • scripts_remains = array of arguments (the scripts passed)
  • scripts_start = array is filled only temporary, as long the counter scripts_running is less than the LIMIT.
  • While filling the scripts_start, i remove the according array element from scripts_remains.
  • scripts_todo = array that gets filled as soon the scripts_start entry is started. (and afterwards removed from scripts_start)
  • scripts_id = array which holds the PID of the script of the same ID of scripts_todo

When the script no longer finds the PID, it moves the element from scripts_todo/scripts_id to done_scripts, and fills done_ret, both with the same array id.
(this part works still/again)

But the starting array gets only the first script and not more... however, it does list as many entries as the LIMIT allows, but all contain the first script and only 1 entry does actually show the proper PID.
scripts_remains still holds all array elements, which confuses me, furthermore scripts_todo holds only 1 array element. but lists as many entries as the LIMIT allows.

NOTE: This script is(will be) part of my TUI (text user interface) package, therefore all those tui-* commands ARE available and 'valid' - on my system at least.

Please see the screen-shot on how it outputs, and the code segment that is faulty.
Acknowledge, that the error referring to line 332 is due to the wrong set / missing array values.

I have a small script, that is working, but since i want/need to have some handlers and argument handling around it,
it does in no way refer the issues i experience.

I'm asking if you either see a 'wrong'-code, in the part visible, or if you would have an advice for another approach/method to achieve my goal?
Thank you in advance

EDIT:
It seems that the first script (only one loaded to array) is started/executed as many times as the limit is set to, without continuing to the next array element.. :confused:
But i cant pinpoint it.

If you need the full code, i'd happily share it, just since it wont run (unless you have TUI installed) i thought its not really helpfull anyway.

EDIT:
Done the 'no limit' part, which works:

			while [[ ! -z "${scripts_remains[@]}" ]]
				do	# Start the script:
					script="${scripts_remains[$counter_start]}"
					scripts_todo[$counter_start]="$script"
					
					# Generate the command
					[[ [./] = "${script:0:1}" ]] && PRE="" || PRE="./"
					cmd="${PRE}\"$script\" ; echo \$? > \"$TEMP/$script.ret\""
					#doLog "Executing: $cmd"
					
					# Get PID to array
					( eval $cmd ) &
					scripts_id[$counter_start]="$!"
					
					# Remove it from array, and move to next entry
					unset scripts_remains[$counter_start]
					((counter_start++))
				done

So I need to get this code segment working with a LIMIT factor, so only LIMIT amount of scripts are started as long scripts_running is smaller than the LIMIT - see the main post -

Screen-shot *1small : the non-working part
Screen-shot *2
small : the working no-limit part

EDIT3:
I'm now 100% sure, I've had somewhere screwed up with the arrays, but atm I'm too blind...

This is bad:

while [[ ! -z "${scripts_remains[@]}" ]]

If there's more than one thing in that array, this will die with 'too many arguments' or 'unexpected argument' or the like for cramming [[ ! -z "a" "b" "c" "d" "..." ]] into your statement.

Try while [[ ! -z "${scripts_remains[*]}" ]]

I might be able to comment more on your code if you posted anything except screenshots.

If it's too big to post, trim it down! Make a minimal example that still shows the problem. There's probably good odds that doing so will actually find the problem, too.

I'd be tempted to just use strings and/or positional parameters instead of arrays here, too. They're much easier to deal with in a lot of circumstances.

# loop over array
for X in "${ARRAY[@]}" ; do ...  ; done

# loop over string
for X in $string ; do ... ; done

# rotating an array
TMP=${ARR[0]}
for((N=1; N<${#ARR[@]}; N++)) ; do ARR[$((N-1))]=ARR[$N] ; done ; ARR[${#ARR[@]}]=$TMP

# rotating a string
set -- $string ; set -- "$@" $1 ; shift
string="$*"

Refinition of the problem:
It is now starting script1 - three times at once, while removing (but only once) script2, but then never again any other script from the remains list.
As well as adding another tripple bunch script1's as soon the previous three script1 ones finsihed, appending them infitie to done list..

Right, minimal example... working on...

EDIT:
In the screenshot *4_small* you see again, the '4' within an error messsage - but labled as /script1 - both a path&name issue, but in the screenshot *3_small* the 4th script is not listed at all.

Ok, here's the smallest i could get...
But now script3 gets lost among the way, everything else seems to be working fine...

#!/bin/bash
#
#	Variables
#
	scripts_remains=( "${@}" )
	scripts_total=${#scripts_remains[@]}
	TMP_DIR=$HOME/.cache/$$
	
	# Filled in the process
	unset scripts_todo[@] scripts_id[@]
	unset done_scripts[@] done_ret[@]
	
	# Defaults
	LIMIT=5
	WAIT=5
	
	# Counters - Fixed
	counter_done=0
	
	# Counters - Dynamic
	counter_start=0
	counter_running=0
#
#	Environment check
#
	[[ -d "$TMP_DIR" ]] || mkdir -p "$TMP_DIR"
#
#	Display & Action --> limit 5, passing 4
#
	while [[ $counter_done -lt $scripts_total ]]
	do	# Loop the menu & Reset some values
		
		# Step 1
		# Check if there are files to be started
		echo "Scripts @ start"
# The Limit check is worthless, even when set to 2, all passed scripts gets executed on first loop...
		if [[ $counter_running -lt $LIMIT ]]
		then	# So we look in the script_remains for tasks
			num=0
			for S in ${scripts_remains[@]};do
				[[ -z "$S" ]] && break
				# Generate the command & save to new array
				[[ [./] = "${S:0:1}" ]] && PRE="" || PRE="./" 
				cmd="$PRE$S ; echo \$? > $TMP_DIR/$S.tmp"
				echo "Starting: $S"
				(eval $cmd) &
				scripts_id[$counter_start]="$!"
				scripts_todo[$counter_start]="$S"
				unset scripts_remains[$num]
				((counter_start++))
				((counter_running++))
				((num++))
			done
		fi
		
		# Step 2
		# Print status of already done scripts
		echo "Scrips @ done"
		C=0
		for D in "${done_scripts[@]}";do
			R=${done_id[$C]}
			echo "$D ended $R"
			((C++))
		done
		
		# Step 3
		# Show current tasks running -- now loops here endlessly...  because a script gets lost within the loop
		num=0
		echo "Scripts @ work"
		for W in "${scripts_todo[@]}";do
			# Only display if array element is not empty
			if [[ ! -z "$W" ]]
			then	val=${scripts_id[$num]}
				if [[ ! -z "$val" ]]
				then	if ps -ha | grep $val|grep -v -q grep
					then	echo "$W works : $val"
					else	echo "$W has ended..."
						# Unset this item now
						done_scripts[$counter_done]="$W"
						read R < $TMP_DIR/$W.tmp
						done_id[$counter_done]="$R"
						unset scripts_todo[$num] scripts_id[$num]
						((counter_done++))
						((counter_running--))
					fi
				fi
				((num++))
			fi
			
		done
		
		[[ $counter_done -lt $scripts_total ]] && \
			echo "wait for update: $WAIT" && \
			sleep $WAIT && \
			clear
	done
#
#	Clean up temp files
#
	rm -fr "$TMP_DIR"

BTW: These are my test scripts

grep -n sleep *
script1:2:sleep 30 
script2:2:sleep 9
script3:2:sleep 20
script4:2:sleep 15

What does 'lost along the way' mean?

Its no longer visible. File still exists physicly on the disk, but it gets lost somewhere within the arrays...

As in:

+ ~/tmp/9388 $ sh ../../psm-mini.sh *
Scripts @ start
Starting: script1
Starting: script2
Starting: script3
Starting: script4
Scrips @ done
Scripts @ work
script1 works : 19364
script2 works : 19365
script3 works : 19367
script4 works : 19369
wait for update: 5
...
some screen updates later
...
Scripts @ start
Scrips @ done
script2 ended 0
script4 ended 0
script1 ended 0
Scripts @ work
wait for update: 5

As soon the first script ends, script3 gets 'lost'

EDIT:
Also, when i change LIMIT to 2, all 4 scripts are started at once.. :confused:

I don't think "unset" really does what you think it does -- deleting from the middle of an array like that. Whenever I do that I end up with "holes", indexes that still exist but have no value. Associative arrays in shell is pretty chancy anyhow, you often won't have a bash new enough, or have bash at all.

Why keep an array of arguments when you already have one, $@

A construct I often use:

#!/bin/bash

# This is a ring buffer.  Append at PIDS[$PW], read at PIDS[$PR].
# PW increments when a process is added, PR when a process dies,
# and both wrap at MAX.  Order is not important when removing
# since it shuffles the last process to whatever got deleted.
PIDS=()
PW=0
PR=0
MAX=2

running() {     # running "arrayindex"
        ps "${PIDS[$1]}" >/dev/null
}

add() {         # add "pid"
        PIDS[$PW]="$1"
        PW=$(( (PW+1)%MAX ))
        ((TOTAL++))
}

rem() {         # rem "arrayindex"
        # Take PID from the end, plunk it where the one to be deleted is
        PW=$(( (PW-1) % MAX ))
        [[ "$PW" -lt 0 ]] && PW=$((PW+MAX))

        PIDS[$PR]="${PIDS[$PW]}"
        ((TOTAL--))
}

# Start each given program in turn
for X in "$@"
do
        while [[ "$TOTAL" -ge "$MAX" ]]
        do
        for((N=0, I=PR; (TOTAL>=MAX) && (N<TOTAL); N++, I=(I+1)%MAX ))
        do
                if ! running "$I"
                then
                        rem "$I"
                        break
                fi
        done
                [[ "$TOTAL" -ge "$MAX" ]] && sleep 0.1
        done

        # Refers to array indexes, i.e. /tmp/$$-0 for array index 0.
        $X >/tmp/$$-$PW &
        add "$!"
done

while [[ "$TOTAL" -ge "$MAX" ]]
do
        for((N=0, I=PR; N<$TOTAL; N++, I=(I+1)%MAX ))
        do
                if ! running "$I"
                then
                        rem "$I"
                        break
                fi
        done
done

wait
rm -f /tmp/$$-*
1 Like

@sea On systems with /proc [ -d /proc/$PID ] is a nice and efficient alternative to using ps ... | grep $PID | grep -v grep

@Corona688 Interesting script - Is it up to the caller to ensure they dont exceed MAX jobs on calling (I cant spot anything that limits the number of background jobs started).

The while [[ "$TOTAL" -ge "$MAX" ]] loop waits until one quits.

Even on systems without /proc, you can just do "ps pid" and it will return true if it exists and false if it doesn't, no need to grep | awk | sed | kitchen | sink.

Oh yes I see it now - think I missed it due to the indenting :stuck_out_tongue:

If you save the PIDs when you fire up a background job:

job& pid=$!

an even faster way to find out if the job is still running is:

if kill -0 $pid 2>&/dev/null
then    echo "$pid is still running"
else    echo "$pid is no longer running"
fi

First time I've seen this approach does this actually signal the running process in any way?

Do you know how portable this zero signal is?

kill is faster than ps? I think they're both externals..

Kill is a built-in on a lot of shells, probably because it needs to be aware of jobs. aka kill %1 .

1 Like

The POSIX Standards and the Single UNIX Specifications specify that the function call:

kill(pid, 0)

shall not actually send a signal but shall perform normai error checking. So, if the process ID is valid and you have permission to send that process a signal, it will return 0; otherwise, it will return -1 with errno set to ESRCH if the process doesn't exist or EPERM if you don't have permission to send a signal to your child. The standards also require that the kill utility behave as though kill(pid, 0) were called when you invoke the command kill -0 pid . All UNIX Systems do this; I can't speak to whether or not Linux systems' kill utilities and system calls conform to these requirements.

As Chubler_XL said, any shell that handles job control that I've seen has kill as a shell built-in. Even if it doesn't, ps usually needs to do a lot more poking around in kernel memory than kill does; so it should be faster.

1 Like

Hi.

Obviously a lot of work and thought has been put into this. As far as I know, the requirements:

    * 'any' amount of scripts passed,
    * provides a limiter, so only X scripts are running simultaneously
    * Prints the status code of each of the scripts ran

can be accomplished through utilities xargs and GNU parallel. Here's an example of a number of script names in a data file that will be "executed". First singly (not in parallel), then at most n=2 at a time. The script that "runs" the script name is run with ksh so that the number of processes can be more easily seen.

#!/usr/bin/env bash

# @(#) s1	Demonstrate parallel execution of scripts, GNU parallel.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C parallel

FILE=${1-data1}

pl " Input data file, names of scripts, $FILE:"
cat $FILE

pl " Master execution script s0:"
cat s0

pl " Results:"
parallel ./s0 < $FILE

pl " Results, limitation of 2 scripts simultaneously:"
parallel --jobs 2 ./s0 < $FILE

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
parallel GNU parallel 20111122

-----
 Input data file, names of scripts, data1:
foo
bar
baz
qux
quux
corge

-----
 Master execution script s0:
#!/usr/bin/env ksh

# @(#) s0	Demonstrate script to execute a script from argument 1.

printf " $0: would have executed script %s here, process $$\n" $1
printf " exit status of script %s is %d\n" $1 $?
printf " Current processes:\n"
ps
sleep 1

exit 0

-----
 Results:
 ./s0: would have executed script foo here, process 12774
 exit status of script foo is 0
 Current processes:
  PID TTY          TIME CMD
 3451 pts/13   00:00:01 bash
12670 pts/13   00:00:00 bash
12734 pts/13   00:00:00 parallel
12774 pts/13   00:00:00 ksh
12775 pts/13   00:00:00 ps
 ./s0: would have executed script bar here, process 12776
 exit status of script bar is 0
 Current processes:
  PID TTY          TIME CMD
 3451 pts/13   00:00:01 bash
12670 pts/13   00:00:00 bash
12734 pts/13   00:00:00 parallel
12776 pts/13   00:00:00 ksh
12777 pts/13   00:00:00 ps
 ./s0: would have executed script baz here, process 12781
 exit status of script baz is 0
 Current processes:
  PID TTY          TIME CMD
 3451 pts/13   00:00:01 bash
12670 pts/13   00:00:00 bash
12734 pts/13   00:00:00 parallel
12781 pts/13   00:00:00 ksh
12782 pts/13   00:00:00 ps
 ./s0: would have executed script qux here, process 12783
 exit status of script qux is 0
 Current processes:
  PID TTY          TIME CMD
 3451 pts/13   00:00:01 bash
12670 pts/13   00:00:00 bash
12734 pts/13   00:00:00 parallel
12783 pts/13   00:00:00 ksh
12784 pts/13   00:00:00 ps
 ./s0: would have executed script quux here, process 12785
 exit status of script quux is 0
 Current processes:
  PID TTY          TIME CMD
 3451 pts/13   00:00:01 bash
12670 pts/13   00:00:00 bash
12734 pts/13   00:00:00 parallel
12785 pts/13   00:00:00 ksh
12786 pts/13   00:00:00 ps
 ./s0: would have executed script corge here, process 12787
 exit status of script corge is 0
 Current processes:
  PID TTY          TIME CMD
 3451 pts/13   00:00:01 bash
12670 pts/13   00:00:00 bash
12734 pts/13   00:00:00 parallel
12787 pts/13   00:00:00 ksh
12788 pts/13   00:00:00 ps

-----
 Results, limitation of 2 scripts simultaneously:
 ./s0: would have executed script foo here, process 12815
 exit status of script foo is 0
 Current processes:
  PID TTY          TIME CMD
 3451 pts/13   00:00:01 bash
12670 pts/13   00:00:00 bash
12789 pts/13   00:00:00 parallel
12815 pts/13   00:00:00 ksh
12816 pts/13   00:00:00 ps
12817 pts/13   00:00:00 ksh
12818 pts/13   00:00:00 ps
 ./s0: would have executed script bar here, process 12817
 exit status of script bar is 0
 Current processes:
  PID TTY          TIME CMD
 3451 pts/13   00:00:01 bash
12670 pts/13   00:00:00 bash
12789 pts/13   00:00:00 parallel
12815 pts/13   00:00:00 ksh
12817 pts/13   00:00:00 ksh
12818 pts/13   00:00:00 ps
 ./s0: would have executed script baz here, process 12822
 exit status of script baz is 0
 Current processes:
  PID TTY          TIME CMD
 3451 pts/13   00:00:01 bash
12670 pts/13   00:00:00 bash
12789 pts/13   00:00:00 parallel
12822 pts/13   00:00:00 ksh
12823 pts/13   00:00:00 ps
12824 pts/13   00:00:00 ksh
12825 pts/13   00:00:00 ps
 ./s0: would have executed script qux here, process 12824
 exit status of script qux is 0
 Current processes:
  PID TTY          TIME CMD
 3451 pts/13   00:00:01 bash
12670 pts/13   00:00:00 bash
12789 pts/13   00:00:00 parallel
12822 pts/13   00:00:00 ksh
12824 pts/13   00:00:00 ksh
12825 pts/13   00:00:00 ps
 ./s0: would have executed script corge here, process 12828
 exit status of script corge is 0
 Current processes:
  PID TTY          TIME CMD
 3451 pts/13   00:00:01 bash
12670 pts/13   00:00:00 bash
12789 pts/13   00:00:00 parallel
12826 pts/13   00:00:00 ksh
12827 pts/13   00:00:00 ps
12828 pts/13   00:00:00 ksh
12829 pts/13   00:00:00 ps
 ./s0: would have executed script quux here, process 12826
 exit status of script quux is 0
 Current processes:
  PID TTY          TIME CMD
 3451 pts/13   00:00:01 bash
12670 pts/13   00:00:00 bash
12789 pts/13   00:00:00 parallel
12826 pts/13   00:00:00 ksh
12827 pts/13   00:00:00 ps
12828 pts/13   00:00:00 ksh
12829 pts/13   00:00:00 ps

My apologies if I missed the point ... cheers, drl

1 Like

His original script would have been able to run entire, complex, arbitrary shell statements in parallel. He wasn't testing it with them, but it could. This isn't necessarily a good thing, mind you -- leaves the door open for a lot of unintended problems.

Thank you guys, i've tried to adapt some of your suggestions, /proc/$pid, kill -0 $pid, kill($pid,0) but none of them worked as expected.
The hint with a "single id number" was great, that way i just managed to make it even smaller than the previous example, with full functionality! :slight_smile:

Screenshot shows how it returns the number of successfully executed scripts, when passed -c ...

Below is the 'core' snippet, full script can be seen on https://github.com/sri-arjuna/tui/blob/master/bin/tui-psm
And it is part of: GitHub - sri-arjuna/tui: A line based Text User Interface framework for scripts

Hope you like it, and thank you for your help!

#
#	Variable presets
#
	script_name=( "${@}" )		# Contains all files
	MAX=${#script_name[@]}		# Max amount of scripts
	script_status=( $(for s in "${script_name[@]}";do echo "3";done) )		# Status with same counter of that file	: done(0) failed(1) running(2) todo(3)
	script_pid=()			# PID with the same counter of that file

	# Counters
	RUN=0			# How many scripts are currently running
#
#	Display & Action
#
	while [[ $DONE -lt $MAX ]]
	do	# Loop the menu
		C=0
		DONE=0			# How many scripts are 'done' (regardless of status)
		GOOD=0			# How many scripts ended succesffully
		if ! $QUIET
		then	clear
			tui-header "$TITLE ($script_version)" "$(date +'%F %T')"
			tui-title "Status"
		fi
		
		while [[ $C -lt $MAX ]]
		do	# Vars
			STATUS="${script_status[$C]}"	# Current status
			RET_FILE="$TEMP/$(basename ${script_name[$C]}).ret"
		
			# Do action according to current status
			case $STATUS in
			2)	# IS PID still available?
				pid=${script_pid[$C]} 
				if [[ ! -z "$(echo $pid)" ]]
				then	#if ! ls /proc/${script_pid[$C]}
					if ! ps $pid > /dev/zero
					#if kill -0 $pid  2>&/dev/null
					then	# Its finished
						read RET < "$RET_FILE"
						[[ -z "$RET" ]] && RET=1
						script_status[$C]=$RET
						((RUN--))
					fi
				else 	tui-status 1 "This should not happen, empty pid while running"
				fi
				;;
			3)	# Its TODO, can we start it?
				if [[ $RUN -lt $LIMIT ]] || [[ $LIMIT -eq 0 ]]
				then 	script_status[$C]=2
					STATUS=2
					((RUN++))
					script="${script_name[$C]}"
					[[ [./] = "${script:0:1}" ]] && PRE="" || PRE="./"
					cmd="\"${PRE}$script\" ; echo \$? > \"$RET_FILE\""
					touch "$RET_FILE"
					( eval "$cmd" ) &
					script_pid[$C]=$!
				fi
				;;
			*)	((DONE++))
				[[ $STATUS -eq 0 ]] && ((GOOD++))
				;;
			esac

			# Display latest status
			if ! $QUIET
			then	case $STATUS in
				0|1)	tui-status $STATUS "Finished ${script_name[$C]}"  ;;
				2)	tui-status $STATUS "Running ${script_name[$C]}" "${script_pid[$C]}" ;;
				3)	tui-status $STATUS "Waiting ${script_name[$C]}" ;;
				127)	tui-status 1 "Typo in script: \"${script_name[$C]}\""	;;
				*)	tui-status 1  "Invalid STATUS ($STATUS) on $C ${script_name[$C]}"	;;
				esac
			fi
			((C++))
		done
		
		if ! $QUIET
		then	tui-echo
			tui-title "Summary"
			tui-echo "Scripts completed:" "$DONE/$MAX"
			tui-echo "Currently running:" "$RUN/$LIMIT"
			tui-echo "Successfully executed:" "$GOOD"
			
			tui-echo
			tui-wait $WAIT "Wait for update..."
			echo
		else	sleep $WAIT
		fi
	done

EDIT:
I just think it looks cool, thank you guys!

1 Like