Parse for 2 numbers in large single line

hburnswell · August 25, 2018, 7:58pm

Hi All,

I am writing a script in which I need to gather 2 numbers for 'total' and 'successful'. The goal is to compare the two numbers and if they are not equal, rerun the task until all are successful. I'm thinking the best way will be with awk or sed, but I really don't know where to begin with this one.

The line that is outputted from the task is a very large single line with no spaces. However the beginning of the line appears to be consistent. Here is the output from my 4 manual runs:

{"_shards":{"total":1717,"successful":1712,"failed":5}
{"_shards":{"total":1717,"successful":1715,"failed":2}
{"_shards":{"total":1717,"successful":1714,"failed":3}
{"_shards":{"total":1717,"successful":1717,"failed":0}

As mentioned, it looks like the easiest way to test the need for rerunning is comparing the total and successful numbers.

Can anyone provide any guidance as to how to gather the 2 numbers and then do a loop to rerun the task until there are no errors?

Any help is greatly appreciated.

Thanks in advance,

HB

Scrutinizer · August 26, 2018, 3:35am

Does the task not use an exit code ? You can test for it until it is succesful

until task
do
  :
done

This will run indefinitely if task will never be successful

Alternatively (bash/ksh) you can limit the number of attempts and report about them.

for i in {1..10}
do
  if task
  then
    echo "successful after ${i} attempt(s)"
    break
  fi
  if (( i >= 10 )); then
    echo "Max number of attempts exceeded; task was unsuccessful"
    exit 1
  fi
done

--
If there is no return code and testing the output is the only option, then I suggest testing for the number of fails :

until [[ $result =~ ^\{\"_shards\":\{\"total\":[0-9]+,\"successful\":[0-9]+,\"failed\":0\} ]]
do
  result=$(task)
done

Likewise, you can use

result=$(task)
if [[ $result =~ ^\{\"_shards\":\{\"total\":[0-9]+,\"successful\":[0-9]+,\"failed\":0\} ]]

in the second example...

RudiC · August 26, 2018, 6:08am

Like Scrutinizer, I'd go for the "failed" count:

while IFS=":}" read -a INARR  <<< $(task) && [ 0 -ne "${INARR[${#INARR[@]}-1]}" ]; do :; done

Don_Cragun · August 26, 2018, 12:02pm

If task doesn't return a useful exit code (i.e., always returns a zero exit status), one could still skip parsing counts and just loop until success is found:

while line=$(task)
do	if [ "$line" = "${line%:0\}}" ]
	then
		echo "One or more tests failed: $line"
	else
		echo "All tests passed: $line"
		break
	fi
done

hburnswell · August 26, 2018, 1:21pm

Thank you for the responses.

Yes, unfortunately the exit code is always zero no matter the failures. I should have given more information, sorry.

The task is actually a 'curl' command for use with an Elasticsearch environment:

curl -u elastic -X POST "localhost:9200/_flush/synced"

As mentioned, the output is a very long single line, i.e. <truncated>:

{"_shards":{"total":1717,"successful":1717,"failed":0},"metricbeat-2018.08.01":{"total":2,"successful":2,"failed":0},"metricbeat-2018.08.02":{"total":2,"successful":2,"failed":0},"metricbeat-2018.08.03":{"total":2,"successful":2,"failed":0},....

The command will receive a 200 HTTPD response and a zero exit code.

I agree Scrutinizer & RudiC, the 'failed' count is what I will use.

I am trying a couple renditions now and will report back..

Thanks for the guidance.

HB

RudiC · August 26, 2018, 1:56pm

Go for

$(curl ... |  grep -Eo "shards[^}]*failed\":[0-9]*")

hburnswell · August 26, 2018, 6:32pm

RudiC thank you for that suggestion of: 'grep -Eo'... It allowed me to get it working with:

synced_flush () {

        $curl -s -u ${creds} -X POST "localhost:9200/_flush/synced" | \
        grep -Eo "shards[^}]*failed\":[0-9]*" | \
        sed -e 's/.*://'

}

until [[ $(synced_flush) == "0" ]]

        do

                synced_flush

        done

This appears to be doing the trick. If there is something really wrong with this approach please let me know. Like most of the scripts I write, they start off 'working' and I improve the efficiency with time ...

Thanks again scrutinizer, Don Cragun, and RudiC.. Much appreciated.

HB

bakunin · August 26, 2018, 7:32pm

hburnswell:

If there is something really wrong with this approach

synced_flush () {

   $curl -s -u ${creds} -X POST "localhost:9200/_flush/synced" | \
   grep -Eo "shards[^}]*failed\":[0-9]*" | \
   sed -e 's/.*://'

}

Not "really wrong", but you can optimise a bit. Change:

        $curl -s -u ${creds} -X POST "localhost:9200/_flush/synced" | \
        grep -Eo "shards[^}]*failed\":[0-9]*" | \
        sed -e 's/.*://'

to

        $curl -s -u ${creds} -X POST "localhost:9200/_flush/synced" | \
        sed -n ' /^{"_shards.*failed":[0-9]/ {
                            s/.*failed://
                            s/\([0-9]*\).*/\1/p
                   }'

as sed can do everything grep is able to do, so you need only one of them. Usually this doesn't make a big difference but if the combination is called many times over the one saved process amounts to some considerably less time needed.

I hope this helps.

bakunin

hburnswell · August 28, 2018, 11:38pm

bakunin - thank you for the suggestion..

As mentioned, these are the small tweaks that I look for as my scripts mature ;-)..

I appreciate the guidance..