[bash] wanted: function with a clean way for multiple return values

joker · July 12, 2016, 5:01pm

Hi,

I have a small part of a project which is done as a bash script. bash was selected as an portability issue that works out of the box. In this script I have an exec shell-function, a wrapper around arbitrary commands. I want to have STDOUT, as an addon STDERR and the EXIT-CODE of a specified command.

I'd like to have a clean wrapper, but my solution at the moment is ugly and does not produce clean code at the calling side. Maybe you have some hints to improve it.

In saying "clean" I'm thinking of side effect free programming, i. e. don't get into a mess with global variables and encapsulate all within the _exec function and just have that function to be called. (In the current situation, I have to use an additional line for every extra parameter I'm getting from _exec in the calling function.) I'd like to avoid eval("eval" should better be named "evil") too :).

That's what I have so far:

function _dbg {
      # verbose logging here
}

function _fatal {
      # fatal error handling here
}

function _exec {
        local CMD=$1
        local OUT="$( $CMD 2>&1 ;echo "||$?" )"
        local EXIT_CODE="${OUT##*||}"

        # if no output is there at all [[:space:]]* does
        # not match. So do one with space and one without

        OUT="${OUT%[[:space:]]*||[0-9]*}"
        OUT="${OUT%||[0-9]*}"
        OUT="${OUT:-<EMPTY>}"

        if [ "$EXIT_CODE" != "0" ]; then
                _fatal $ESHELLERR "$EXIT_CODE" "$OUT" "$CMD"
        else
                _dbg "SHELL EXEC successful CMD: $CMD EXIT_CODE: $EXIT_CODE OUT: $OUT"
        fi
        echo "$EXIT_CODE||$OUT"
}

function _delete_compatible {

        local DIR="$1"
        local OUT="$(_exec "rm -rf $DIR")"
        local EXIT_CODE="${OUT%||*}"
        OUT="${OUT#*||}"
        if [ "$EXIT_CODE" == "0" ]; then
                _dbg "compatible delete successful"
                _success
        else
                echo "$OUT"
        fi
}

Not sure if this one will help me:
Returning Values from Bash Functions | Linux Journal

Don_Cragun · July 12, 2016, 7:35pm

Without knowing what arguments you intend to pass to _exec and what output you are hoping to produce, we can only make wild guesses about what might or might not work for you.

But using echo to print arbitrary text is dangerous (especially when the operating system you're using and the environment variables in place when your script is invoked are unspecified).

Is there a limited set of commands that will be passed to your _exec function as the 1st operand, or can the user specify any command available on your system? Will your _exec function be invoked with a command and parameters in a quoted string passed as the 1st operand; or just a command name that will be executed with no parameters?

How many terabytes of output might be produced by the command given as the 1st operand to your _exec function?

Where is the variable ESHELLERR assigned a value (or are you expecting _fatal to be called with three operands instead of four)?

joker · July 12, 2016, 8:00pm

_exec is my own wrapper to call any command. It's called only from within a larger bash script from other functions. I'm clear that it's my solely responsibility to check what I feed into that _exec function and I'm taking care that any data from outside has to be securely examined - which needs extra care when done within bash.

Always complete command as first argument as shown in the _delete_compatible example. (No Redirect within the given command line. No variables to be substituted. Ready to run command lines).

It should be generic, but I assume I'll never process more than 50 Kilobytes.

That _fatal thing is working completely fine. But if you're curious: ESHELLERR is a global variable containing an integer which itself represents an index to an associative array with a descriptive message(format string) regarding ESHELLERR. _fatal can be called with variable args. Here are some supplements to _fatal:

ESHELLERR=64
declare -A ERRMSG
ERRMSG[$ESHELLERR]="Shell execution error. EXITCODE: %s ERRMSG: %s COMMAND: %s"
export ESHELLERR ERRMSG

function _fatal {
        ERROR_CODE=$1
        if [[ -n "$ERROR_CODE" && -n "${ERRMSG[$ERROR_CODE]}" ]] ;then
                if [ -n "$2" ] ; then
                        shift
                        MSG="$(printf "${ERRMSG[$ERROR_CODE]}" "$@")"
                else
                        MSG="${ERRMSG[$ERROR_CODE]}"
                fi
                _log "FATAL: $MSG"
                echo "LC_SYSTEM:ERROR $MSG"
                exit $ERROR_CODE
        else
                ERROR_CODE=${ERROR_CODE:-$ERRUNKNWN}
                _log "FATAL: Unkown Error occurred"
                echo "LC_SYSTEM:ERROR Unkonwn Error occurred"
                exit $ERROR_CODE
        fi
}

bakunin · July 13, 2016, 5:43am

OK, some "theory of programming 101" seems to be in order:

Subroutines in general come in two forms: procedures and functions. PASCAL had this difference laid down in keywords, while C and all its children blurred this fact by only having functions.

The difference is: functions have one (exactly one!) return value, procedures have none. So one could treat procedures as special functions with a return value of <void> (this in fact is what C does).

In "normal" programming languages the return value of a function could be anything: a number, a string or any other data type the language is capable of to define. For instance:

mylog=calc_natural_log( 5 )

Will pass the number (constant) 5 as argument to the function "calc_natural_log" and the function will return something (perhaps a floating point number), which in turn gets stored in the variable "mylog".

In shell programming function can only have one sort of return values: an unsigned short short integer, equivalent to the "return code" (or "error level") the OS knows. In fact this is (treated as) the same as this shows:

if myfunc "arg1" "arg2" ; then
     echo "myfunc() returned 0"
else
     echo "myfunc() returned non-zero"
fi

The if-keyword treats the return code of the function the same way it would treat the RC of any external command:

if /some/command ; then
     echo "command returned 0"
else
     echo "command returned non-zero"
fi

That leaves the question: how to manipulate data across functions/procedures? First we need to differentiate between data the function only needs to know and data the function needs to change .

Data the function needs to know can be set as globals. It is good practice to use - as a rule of thumb - globals only for constants. You can manipulate such constants via the dot-execution method, but in most cases this is a bad idea because it indeed introduces a side effect of the execution of the function.

In general it is a good idea to pass all the information a function needs to know as a parameter to it. I do usually create a "function outline" before i even write it by simply asking what the function needs to know (the parameter set) and what it needs to give back. This way, when i start actually writing the function, i already have the "interface" of what it has to look like to the outside ready.

Here is an example: i once wrote a function for distributed execution of commands (via ssh-sessions). Now i thought: what do i need to know?

1) the host on which to execute
2) the command to execute
3) the use which to use for the connection
4) the user under which to execute

What do i need to give back to the calling routine?

1) The error level of the command executed
2) any output it might produce (stdout and stderr)
3) a return value of the function itself in case something goes wrong (host refused connection, etc.)

So i could write a function outline like that:

function fExecRemote
typeset chHost="$1"
typeset chConnUser="$2"
typeset chRemUser"$3"
typeset chCommand="$4"

ssh -nqo 'BatchMode = yes' "$chConnUser"@"$chHost" "sudo su - $chRemUser $chCommand"

return $?
}

# main()

fExecRemote "host1" "myself" "myself" "ls -l"
echo $?

exit 0

Things to do (obviously): catch the output, check if a connection is possible (system can be pinged, ssh-keys are exchanged, ...) and so on ... but this is implementation, not design.

For things a function has to manipulate (instead of just know) there are two ways: first, you can catch the <stdout> of a function:

function fSub
{

echo "bla"
echo "foo"
echo "bar"

return 0
}

# main()
typeset OUTPUT=""

fSub | while read OUTPUT ; do
  ....
done

Second, there is the process substitution:

function fSub
{
echo "bla"

return 0
}

# main()
typeset var=$(fSub)

I hope this helps.

bakunin

Don_Cragun · July 13, 2016, 3:16pm

In addition to the sage advice bakunin provided, the following code in your _exec function seem to demonstrate a misunderstanding of how parameter expansions work in the shell:

        local CMD=$1
        local OUT="$( $CMD 2>&1 ;echo "||$?" )"
        local EXIT_CODE="${OUT##*||}"

        # if no output is there at all [[:space:]]* does
        # not match. So do one with space and one without

        OUT="${OUT%[[:space:]]*||[0-9]*}"
        OUT="${OUT%||[0-9]*}"
        OUT="${OUT:-<EMPTY>}"

In the four shell parameter expansions:

${var%pattern}
${var%%pattern}
${var#pattern}
${var##pattern}

pattern is a filename matching pattern; not a regular expression. To get rid of the last || followed by any string of characters from the end of the variable var , you just need:

output=${out%||*}

or (given the way you assign values to the variable out ):

output=${out%||$EXIT_CODE}

should produce identical results.

You then seem to be trying to strip off trailing whitespace characters from the output produced by running $CMD 2>&1 and, after doing that, to change an empty string to the non-empty string EMPTY . But, I have no idea why that is something you would want to do??? And, if that is something you want to do, that is not the way to do it.

joker · July 14, 2016, 4:31pm

Thanks for you effort to help me. I'm not a programmer, but a senior system administrator with experience in a dozen different scripting languages and some mainly school-only experience with C/C++/Java. I read your thread in full but did not discover any additional knowledge, which I do not already have. But again, thanks for your kindness to write such extensive explanations for me.

pattern is a filename matching pattern; not a regular expression. To get rid of the last || followed by any string of characters from the end of the variable var , you just need:

Code:
output=${out%||*}

Of course that is correct. I wrote ${out%||[0-9]*}, because I wanted a numerical value(Exit code) to be matched. That is not meant as a regex. Since it never should be zero chars long, which would be the meaning of the regex. It should be a number followed by something.

The reason was because of the not very likely case, the program output contains ||. I'm realizing now that this is not possible, because ${..%||*} matches only the last occurrance of the the pattern, which must be the one I appended myself. So my construction does not add any extra value.

OUT="${OUT:-<EMPTY>}"

Setting OUT to the string "<EMPTY>" is exactly what I'm accomplishing here. The reason is to explicitly point out in the logfile that the command did not output any result. I want it that way because it's a clearer message than just an empty string, which may have other reasons to occur.

---

Thanks for all hints so far. Any hints on the main question asked? ...which is: Ideas and hints to getting nicer, more easy to use/read bash-code on the calling side outside of the _exec function.