awk script to search output for a value and print


GOODNUMBERS="1 2 3 4 5 6 3 3 34 34 5 66 12"
BADNUMBERS="7 3 12 5 66"

for eachnum in `echo ${GOODNUMBERS}`
do
        echo ${BADNUMBERS} | gawk -v threshold=${eachnum} '$1 != threshold'
done

what im trying to do with the above is, i want to print numbers that are in the GOODNUMBERS variable IF AND ONLY IF they are not in the BADNUMBERS variable.

how can i do this in awk?

Why awk? Try:

for i in $GOODNUMBERS
do
  for j in $BADNUMBERS
  do
    if [ $i = $j ]; then
      continue 2
    fi
  done
  echo $i
done
1 Like

the list of numbers in the GOODNUMBERS variable have the potential to be very big.. so i was worried using a for loop can be very slow.

How big could that list in the GOODNUMBERS variable become then?

awk -vg="${GOODNUMBERS}" -vb="${BADNUMBERS}" 'BEGIN{n=split(g, a); for(i=1; i<=n; i++) {if(b !~ "(^| )" a "($| )") print a}}'

---------- Post updated at 04:28 AM ---------- Previous update was at 04:08 AM ----------

I dont know about the variable limitation in awk, but if it cannot hold as much data as shell does, you could use below

awk -vb="${BADNUMBERS}" '{n=split($0, a); for(i=1; i<=n; i++) {if(b !~ "(^| )" a "($| )") print a}}' <<<${GOODNUMBERS}
1 Like

This will reduce the number of elements to be searched by every hit:

G=($GOODNUMBERS)
for i in $BADNUMBERS
   do for j in ${!G[@]}
      do [ $i -eq 0"${G[j]}" ] && unset G[j]
      done
   done
echo ${G[@]}
1 2 4 6 34 34
1 Like

@Srinishoo:
The variable limitation is not in awk, nor is it a limitation of the shell, but rather an OS limitation determined by the configuration variable ARG_MAX ( getconf ARG_MAX ) .

1 Like

Some versions of awk limit the length of strings held in variables (as well as the maximum length of lines to be read) to LINE_MAX bytes (the awk on OS X is one example of this). (The value of LINE_MAX on your system can be found using getconf LINE_MAX , but is frequently 2048.)

If $GOODNUMBERS and $BADNUMBERS expand to lists shorter than LINE_MAX bytes, running the loop in Scrutinizer's suggested shell script could well be faster than exec'ing awk and interpreting the equivalent loop(s) in an awk script.

Your requirements aren't clear as to whether or not duplicated values in $GOODNUMBERS are supposed to be trimmed to a list of unique numbers, or if duplicated input numbers are to be duplicated in the output. If you want only unique numbers in your output and your list of numbers is too long for your version of awk to accept as a variable, you could try something like:

#!/bin/ksh
GOODNUMBERS="1 2 3 4 5 6 3 3 34 34 5 66 12"
BADNUMBERS="7 3 12 5 66"

printf '%s\n' $BADNUMBERS 'EOF' $GOODNUMBERS | awk '
/EOF/ {	good = 1; next }
!good {	b[$0]; next }
!($0 in b) && !($0 in g) {
	g[$0]; print
}'

which produces:

1
2
4
6
34

Note that even though this output happens to be sorted, this script can mangle the order of values output.

This was tested using the Korn shell, but will also work with bash or any other shell that that uses POSIX standard shell syntax.

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .

1 Like

This is failing with the following error:

Bad substitution

code i'm using is:

GOODNUMBERS="1 2 3 4 5 6 3 3 34 34 5 66 12"
BADNUMBERS="7 3 12 5 66"

G=${GOODNUMBERS}
for i in $BADNUMBERS
   do for j in ${!G[@]}
      do [ $i -eq 0"${G[j]}" ] && unset G[j]
      done
   done
echo ${G[@]}

shell im using is:

/bin/sh

The ${!name[@]} (List of array keys.) expansion may be a bashism not available in sh .

Note that

G=${GOODNUMBERS}

should be

G=($GOODNUMBERS)

With RudiC's example you should use bash , rather than sh

---

Arrays are not officially available either in /bin/sh (It may be that a particular /bin/sh happens to support them, but one should not count on it.. )