awk to compare diff output by fields

Diff output as follows:


< AAA BBB CCC DDD EEE 123
> PPP QQQ RRR SSS TTT 111
> VVV WWW XXX YYY ZZZ 333
> AAA BBB CCC DDD EEE 124

How can i use awk to compare the last field to determine if the counter has increased, and need to ensure that the first 4 fields must have the same values, e.g. in the example above, the counter field has increased by 1 count for "AAA BBB CCC DDD EEE" entry.

Please advise and thanking you in advance.

So if the diff output is like so -

< AAA BBB CCC DDD EEE 123
> PPP QQQ RRR SSS TTT 111
> VVV WWW XXX YYY ZZZ 333
> AAA BBB CCC DDD EEE 124
> AAA BBB CCC DDD EEE 125
> PPP QQQ RRR SSS TTT 222
> PPP QQQ RRR SSS TTT 250
> AAA BBB CCC DDD EEE 129
> GGG HHH III JJJ KKK 111
> VVV WWW XXX YYY ZZZ 334

then should the counter be -

(a) 3, because "AAA BBB CCC DDD EEE" incremented 3 times?, or
(b) 2, because "PPP QQQ RRR SSS TTT" incremented 2 times?, or
(c) 1, because "VVV WWW XXX YYY ZZZ" incremented once?, or
(d) 0, because "GGG HHH III JJJ KKK" did not increment at all?

tyler_durden

In your example, each entry will have it's own counter:

counter (a) = 3, because "AAA BBB CCC DDD EEE" incremented 3 times
counter (b) = 2, because "PPP QQQ RRR SSS TTT" incremented 2 times
counter (c) = 1, because "VVV WWW XXX YYY ZZZ" incremented once
counter (d) = 0, because "GGG HHH III JJJ KKK" did not increment at all

And based on the counter value, some actions will be triggered.

nawk '{idx=$2 FS $3 FS $4 FS $5 FS $6;a[idx]++}END {for(i in a) print qq i qq " incremented " a-1 " times"}' qq='"' diffOutputFile

Thanks for the reply. I think I've confused some of us including myself. Let me try this again:

Diff output has this:

AAA BBB CCC DDD EEE 10
AAA BBB CCC DDD EEE 11
PPP QQQ RRR SSS TTT 5
PPP QQQ RRR SSS TTT 4
VVV WWW XXX YYY ZZZ 2

need some awk help to:

  • scan the lines and compare the first 5 fields of each line
    • and when all 5 fields match, then compare the 6th field (i.e. counter) to check if the value has increased or decreased
      • if the counter value has increased, do something
      • else, do nothing
    • if scan cannot find line with matching first 5 fields, then do something
  • end of scan

Thanks again in advance.. my apologies for the confusion.

#!/bin/ksh

nawk '
  # create an variable "idx" which is a concatenation of the first 5 fields in a record/line
  # as we read the file line by line
  {idx=$1 FS $2 FS $3 FS $4 FS $5}
  {
    # see if "idx" is in array "a" - array "a" is indexed by the value of "idx"
    if (idx in a)
      # if "idx" is already in "a", check if the stored value (a[idx]) is less than the
      # last field ($NF) of the current record/line. If it's, we see the "increase" in value
      # If so, output the current line (print $0)
      if (a[idx]<$NF)
         print $0
    # store the last field ($NF) of the current record in aarray "a" indexed by "idx"
    a[idx]=$NF
  }' myDiffFile | while read line
do
  # read the output of "nawk" and do "doSomething" with the read line
  echo "doSomething with [$line]"
done
1 Like

wow.. i think it's working per your tip. do you mind explaining the code as i am trying to learn how to fish?

Thanks again.

no problem - the 'fishing rod' is the posted code.

how can i modify this so that i can "dosomething else" when no line is found? Please advise and thanking you in advance.

i've tried to use this (see below) but it doesn't handle multiple lines read :frowning:

[[ ! -z $line ]] && { dosomething_else ; } || { dosomething ; }
if [[ ! -z $line ]]; then
     cmd1
     cmd2 
     ...
else
     cmd3
     cmd4
     ...
fi
#!/bin/ksh

nawk '
  # create an variable "idx" which is a concatenation of the first 5 fields in a record/line
  # as we read the file line by line
  {idx=$1 FS $2 FS $3 FS $4 FS $5}
  {
    # see if "idx" is in array "a" - array "a" is indexed by the value of "idx"
    if (!(idx in a))
      print 0 ":" $0
    else
      # if "idx" is already in "a", check if the stored value (a[idx]) is less than the
      # last field ($NF) of the current record/line. If it"s, we see the "increase" in value
      # If so, output the current line (print $0)
      if (a[idx]<$NF)
         print 1 ":" $0
    # store the last field ($NF) of the current record in aarray "a" indexed by "idx"
    a[idx]=$NF
  }' myDiffFile | while IFS=: read smth line
do
  # read the output of "nawk" and do "doSomething" with the read line
  if [ "${smth}" -eq 1 ]; then
     echo "doSomething with [$line]"
  else
     echo "doSomethingElse with [$line]"
  fi
done

Thanks again.. but i couldn't get the "IFS=:" to work, and i ended up using a flag instead to track the lines, i.e.:

flag=0
...
.. | while read line
do
flag=1
echo "doSomething with [$line]"
done
if [ "$flag" -eq 0 ] ; then
   echo "doSomethingElse with [noline]"
fi

another question: how can i doSomething when it's the first occurrence in the diff file before the increment counter comparison?

strange - I get the following for the sample file you provided:

doSomethingElse with [AAA BBB CCC DDD EEE 10]
doSomething with [AAA BBB CCC DDD EEE 11]
doSomethingElse with [PPP QQQ RRR SSS TTT 5]
doSomethingElse with [VVV WWW XXX YYY ZZZ 2]

---------- Post updated at 06:29 PM ---------- Previous update was at 06:26 PM ----------

something like this?

#!/bin/ksh

nawk '
  # create an variable "idx" which is a concatenation of the first 5 fields in a record/line
  # as we read the file line by line
  {idx=$1 FS $2 FS $3 FS $4 FS $5}
  {
    # see if "idx" is in array "a" - array "a" is indexed by the value of "idx"
    if (!(idx in a) && FNR>1)
      print 0 ":" $0
    else
      # if "idx" is already in "a", check if the stored value (a[idx]) is less than the
      # last field ($NF) of the current record/line. If it"s, we see the "increase" in value
      # If so, output the current line (print $0)
      if (a[idx]<$NF || FNR==1)
         print 1 ":" $0
    # store the last field ($NF) of the current record in aarray "a" indexed by "idx"
    a[idx]=$NF
  }' myDiffFile| while IFS=: read smth line
do
  # read the output of "nawk" and do "doSomething" with the read line
  if [ "${smth}" -eq 1 ]; then
     echo "doSomething with [$line]"
  else
     echo "doSomethingElse with [$line]"
  fi
done

Attempted the revised awk script with the inputfile:

Input file:

< AAA BBB CCC DDD EEE 123
> PPP QQQ RRR SSS TTT 111
> VVV WWW XXX YYY ZZZ 333
> AAA BBB CCC DDD EEE 124

Results:

doSomethingElse with [< AAA BBB CCC DDD EEE 123]
doSomethingElse with [> PPP QQQ RRR SSS TTT 111]
doSomethingElse with [> VVV WWW XXX YYY ZZZ 333]
doSomethingElse with [> AAA BBB CCC DDD EEE 124]

Expecting:

doSomething with [> PPP QQQ RRR SSS TTT 111]  # first occurrence
doSomething with [> VVV WWW XXX YYY ZZZ 333]  # first occurrence
doSomething with [> AAA BBB CCC DDD EEE 124]  # last counter field increased from 123 to 124

you changed your input file format - you added '<>' in the first field.
If that's going to be the format, then change

{idx=$1 FS $2 FS $3 FS $4 FS $5}

to

{idx=$2 FS $3 FS $4 FS $5 FS $6}

New input file:

< AAA BBB CCC DDD EEE 123
< GGG HHH III JJJ KKK 100
> PPP QQQ RRR SSS TTT 111
> VVV WWW XXX YYY ZZZ 333
> AAA BBB CCC DDD EEE 124
> GGG HHH III JJJ KKK 99

Expected Results:

doSomething with [> PPP QQQ RRR SSS TTT 111]  # first occurrence
doSomething with [> VVV WWW XXX YYY ZZZ 333]  # first occurrence
doSomething with [> AAA BBB CCC DDD EEE 124]  # last counter field increased from 123 to 124
doSomethingElse with [> GGG HHH III JJJ KKK 99] # last counter field decreased from 100 to 99

---------- Post updated at 01:59 PM ---------- Previous update was at 01:41 PM ----------

Updated the idx variable assignment, and we have this:

doSomethingElse with [< AAA BBB CCC DDD EEE 123]
doSomethingElse with [< GGG HHH III JJJ KKK 100]
doSomethingElse with [> PPP QQQ RRR SSS TTT 111]
doSomethingElse with [> VVV WWW XXX YYY ZZZ 333]
doSomething with [> AAA BBB CCC DDD EEE 124]

Expecting:

doSomething with [> PPP QQQ RRR SSS TTT 111]     # first occurrence
doSomething with [> VVV WWW XXX YYY ZZZ 333]  # first occurrence
doSomething with [> AAA BBB CCC DDD EEE 124]      # last field increment
doSomethingElse with [< GGG HHH III JJJ KKK 99]      # last field decrement

Thanks again for your help.

---------- Post updated at 03:51 PM ---------- Previous update was at 01:59 PM ----------

Hello, i am able to do this via regular scripting (non-awk version):

Inputfile:

< SSS TTT UUU VVV WWW 456
< AAA BBB CCC DDD EEE 123
< GGG HHH III JJJ KKK 100
> PPP QQQ RRR SSS TTT 111
> VVV WWW XXX YYY ZZZ 333
> AAA BBB CCC DDD EEE 124
> GGG HHH III JJJ KKK 99

non-awk version:

#!/bin/ksh

cat inputfile | while read f1 f2 f3 f4 f5 f6 f7 ; do

       linevar=`echo "$f2 $f3 $f4 $f5 $f6"`
       diffvar=`echo "$f1"`

if [[ $diffvar = ">" ]] ; then

       # search for recurring pattern in inputfile
       [ `grep "${linevar}" inputfile | wc -l` -eq 2 ] && { \
         # when pattern is found check last field if increment
         f7=`grep "${linevar}" inputfile | grep ">" | awk '{print $7}'`
         F7=`grep "${linevar}" inputfile | grep "<" | awk '{print $7}'`
         [ $f7 -gt $F7 ] && { echo "doSomething with [${linevar}]" ; } || { echo "doSomethingElse with [${linevar}]" ; }
       } || { \
         # doSomething with first occurence
           echo "doSomething with [${linevar}]"
       }

elif [[ $diffvar = "<" ]] ; then

       # search for non-recurring pattern in inputfile
       [ `grep "${linevar}" inputfile | wc -l` -eq 1 ] && { echo "doSomethingElse with [${linevar}]" ; }

fi

done

exit

results:

doSomethingElse with [SSS TTT UUU VVV WWW]
doSomething with [PPP QQQ RRR SSS TTT]
doSomething with [VVV WWW XXX YYY ZZZ]
doSomething with [AAA BBB CCC DDD EEE]
doSomethingElse with [GGG HHH III JJJ KKK]