Note: environment variables within awk

This is not a question. Just a little note, because I've been here some time and never read about awk accessing environment variables. So here's my use case and demonstration of how to use the ENVIRON array. My operating environment is ubuntu 18.04 / docker / GNU awk 4.1.4. ENVIRON seems to posix compatible as written here: awk

I wrote an awk script to check the mysql output if the mysql replication is fine. The script just returns an exit code because that's what's needed for the container health check.

I have an awk script and a script calling awk that is generating the status text, which awk should parse:

This is the calling script:

#!/bin/sh

mysql -e 'SHOW SLAVE STATUS \G' | mysql_slave_check

But from time to time when I'm debugging, I want verbose output, to see details of the state. So I'd like to have an -v Option to get some verbose output. So since parsing command line arguments within awk seems more difficult to me, I put it into the calling script.

#!/bin/sh

test "$1" = "-v" && DEBUG="DEBUG=1"
mysql -e 'SHOW SLAVE STATUS \G' | mysql_slave_check $DEBUG

Within awk I have to check if that option exists and set internal debug mode like this:

BEGIN {
        DEBUG = (ENVIRON["DEBUG"]!="") ? ENVIRON["DEBUG"] : 0
}

And that's all. Now I have a debug option usable on demand.

And for completeness here's the full awk script:

#!/usr/bin/env awk -f
#
# check if the mariadb-slave is connected and synced with master
#
#       expected text from stdin is the following command
#
#               LC_ALL=C mysql -e 'SHOW SLAVE STATUS \G' 
#
#       what to check:
#
#               "Master_Host"           -> not empty
#               "Slave_IO_Running"      -> "yes"
#               "Slave_SQL_Running"     -> "yes"
#               "Seconds_Behind_Master" -> <= 30
#

function debug(msg) {if(DEBUG==1) { printf "%s",msg }}

/Master_Host: (.+)/                             { MASTER_HOST_SET=1                                     }
/Slave_IO_Running: Yes/                         { SLAVE_IO_RUNNING=1                                    }
/Slave_SQL_Running: Yes/                        { SLAVE_SQL_RUNNING=1                                   }
match($0,/Seconds_Behind_Master: ([0-9]+)/,res) { if(res[1] <= 30 ) { REPL_SECS=res[1];REPL_STATE_OK=1 }}

BEGIN {
        DEBUG = (ENVIRON["DEBUG"]!="") ? ENVIRON["DEBUG"] : 0
}

END {

        debug(sprintf("Master_Host is set       : %10s\n",(MASTER_HOST_SET   == 1)?"OK":"FAILED"))
        debug(sprintf("Slave IO Running         : %10s\n",(SLAVE_IO_RUNNING  == 1)?"OK":"FAILED"))
        debug(sprintf("Slave SQL Running        : %10s\n",(SLAVE_SQL_RUNNING == 1)?"OK":"FAILED"))
        debug(sprintf("Replication Lag          : %10s(%s secs behind)\n",(REPL_STATE_OK     == 1)?"OK":"FAILED", REPL_SECS))

        if ( MASTER_HOST_SET && SLAVE_IO_RUNNING && SLAVE_SQL_RUNNING && REPL_STATE_OK ) {
                GLOBAL_STATUS="PASSED"
        } else {
                GLOBAL_STATUS="FAILED"
        }
        debug(sprintf("Overall DB Slave status  : %10s\n",GLOBAL_STATUS))
        exit (GLOBAL_STATUS=="PASSED")?0:1
 }

---

What is strange, that I made an error in the calling script:

mysql -e 'SHOW SLAVE STATUS \G' | mysql_slave_check $DEBUG

This was an error since I wanted a variable to be set for the awk program call, but it has to be placed in front of the call not after. I wonder why this is working?

The correct call should be this one:

 test "$1" = "-v" && DEBUG=1 || DEBUG=0
 mysql -e 'SHOW SLAVE STATUS \G' | DEBUG=$DEBUG mysql_slave_check
2 Likes

The DEBUG=$DEBUG ensures that DEBUG exists in the environment for the awk.
But it's pure overhead if it was already in the shell's environment.
You can achieve the same with an exported variable (exported = placed in the shell's environment)

export DEBUG
test "$1" = "-v" && DEBUG=1 || DEBUG=0
mysql -e 'SHOW SLAVE STATUS \G' | mysql_slave_check

Because DEBUG is in the shell's environment, all the run commands inherit it.

The use of ENVIRON[ ] in an awk script is interesting. Thanks for sharing!
It is not used often. Reason: when awk is run from the shell there is easy passing of variables.
The Posix style:

test "$1" = "-v" && DEBUG=1 || DEBUG=0
mysql -e 'SHOW SLAVE STATUS \G' | awk -v DEBUG=$DEBUG -f mysql_slave_check

Or the (bit dirty) Unix awk style

test "$1" = "-v" && DEBUG=1 || DEBUG=0
mysql -e 'SHOW SLAVE STATUS \G' | awk -f mysql_slave_check DEBUG=$DEBUG

Either invocation sets the DEBUG variable in awk, so there is no need for pulling it from ENVIRON[ ]

2 Likes

The most important thing for me was here to have a version of a command with an option that is as short as possible, so it's convenient to use it.

And on a second thought, I think environment variables are not such a good way to use in general. I appreciate explicit parameter handover more since external dependencies and data access is not hiden in the code but directly visible at a program call.

And I would better write such script in a single file now:

#!/bin/sh

[ "$1" = "-q" ] && DEBUG=0 || DEBUG=1
mysql -e 'SHOW SLAVE STATUS \G' | awk -v DEBUG=${DEBUG} '

#
# awk program here
#
...
'