Convert a script in awk script

Hello guys,

I have a script like:

echo "Errores 0x01 `cat errores.log | grep 0x00000001 | wc -l` " > class_total
echo "Errores 0x0B `cat errores.log | grep 0x0000000B | wc -l` " >> class_total
echo "Errores 0x45 `cat errores.log | grep 0x00000045 | wc -l` " >> class_total
echo "Errores 0x58 `cat errores.log | grep 0x00000058 | wc -l` " >> class_total
echo "Errores 0x64 `cat errores.log | grep 0x00000064 | wc -l` " >> class_total
echo "Errores 0x66 `cat errores.log | grep 0x00000066 | wc -l` " >> class_total
echo "T O T A L `cat errores.log | grep 0x000 | wc -l` " >> class_total

and when i run it i get a 'class_total' file like:

      Errores 0x01     188 
      Errores 0x0B     127 
      Errores 0x45       0 
      Errores 0x58       0 
      Errores 0x64      38 
      Errores 0x66     140 
         T O T A L     493 

but the file 'errores.log' that i im counting have 5'000.000 of lines and this process take a few seconds, how i do the same in awk to improve the time of response?

nawk -f lest.awk errors.log

here's lest.awk - not tested

BEGIN {
  codesN=split("0x00000001 0x0000000B 0x00000045 0x00000058 0x00000064 0x00000066", codesA, " ");
  split("0x01 0x0B 0x45 0x58 0x64 0x66", codesAs, " ");
}

{
   for(i=1; i <= codesN; i++)
     if ( $0 ~ codesA )
        arr[codesAs]++
}

END {
  for( i in arr ) {
    printf("Errores %s %d\n", i, arr)
    tot+=arr
  }
  printf("TOTAL %d\n", tot)
}

I don't know too much of awk, i just know that awk is powerfull as another, when i run the following script u have this problem:

awk: syntax error near line 1
awk: bailing out near line 1

what can i do?

use nawk instead of awk.

Not yet, i still have problems:

example# nawk prueba.awk borrar

nawk: syntax error at source line 1
context is
>>> prueba. <<< awk
nawk: bailing out at source line 1

vgersh99's solution is good, but it might be possible to refine it a bit. Assuming a nawk/gawk (likewise untested):

k = match($0,/0x000000(01|0B|45|58|64|66)/) {
    arr[substr($0, k + 8, 2)]++
}

END {
    for (i in arr)
        print "Errores", "0x" i, arr
}

What are the differences?

  • Direct match against all the patterns you seek at one time, i.e., no iteration over codesA on every line
  • No lookup in codesAs for each item found

[Old awk solution deleted --- no alternation ("|") available.]

The disadvantage here is that nawk and awk don't "capture" the thing matched, so you still do the substr call on every match (though it's in machine language and therefore fast). (I don't use gawk, but it may have a remedy for this.)

A ruby solution would work like this (and the perl solution would be similar):

$hsh = { "01" => 0, "0B" => 0, "45" => 0, "58" => 0, "64" => 0, "66" => 0 }
ARGF.each do |line|
    next unless m = %r/0x000000(01|0B|45|58|64|66)/.match line
    $hsh[m[1]] += 1
end
$hsh.keys.sort do |k|
    print "Errores 0x", k, " ", $hsh[k]
end

pay more attention to the postings.......
nawk -f prueba.awk borrar

nawk -f prueba.awk borrar

i got it....

nawk -f prueba.awk borrar

Thanx my friend vgersh99!!!!!!!!

Thanx criglerj your solutions works too, but now i have a new problem, the file borrar.log have 2537051 lines and this process is taking 30 seconds, and i need this information for bein showed each 5 seconds, maximus 10... how can i do it better?

Post a sample data file please. If the "0x000000??" appears in a foreseeable position, I think the "substr" procedure can be done in the END{} part.

If the data you're looking for always shows up in the same awk field, e.g., $4, and it's the only thing in that field, then you can speed it up as r2007 suggested, by only checking that field and by using the whole field as the index of arr, then deferring the substring operation to the END block:

$4 ~ /^0x000000(01|0B|45|58|64|66)$/ {
    arr[$4]++
}

END {
    for (i in arr)
        print "Errores", "0x" substr(i,9,2), arr
}

My next line of attack would be ruby or perl. Ruby is easier to read and write, but it works by interpreting the AST at runtime. Perl runs faster because it compiles to bytecode. And I believe perl is installed by default on Solaris 8 (usually an old version, though sysadmins frequently install an updated version). Anyhow, a perl version would look like this:

while (<>) {
    next unless /0x000000(01|0B|45|58|64|66)/;
    $a{$1}++;
}
while (($k, $v) = each %a) {
    print "Errores 0x", $k, " ", $v, "\n"
}

Part of my 'borrar' file:

[2005-06-10 07:28:11]{12}: [R-> PBRX1] DELIVER_SM_RESP [seqno: 79774][trans_id: 23914495 ][cmd_status: 0x00000064]
[2005-06-10 07:28:11]{13}: [R-> PBRX2] DELIVER_SM_RESP [seqno: 79775][trans_id: 23914496 ][cmd_status: 0x00000001]
[2005-06-10 07:28:11]{7}: [R-> PBRX2] DELIVER_SM_RESP [seqno: 79777][trans_id: 23914498 ][cmd_status: 0x00000066]
[2005-06-10 07:28:11]{12}: [R-> PBRX2] DELIVER_SM_RESP [seqno: 79776][trans_id: 23914497 ][cmd_status: 0x00000045]
[2005-06-10 07:28:12]{8}: [R-> PBRX1] DELIVER_SM_RESP [seqno: 79778][trans_id: 23914499 ][cmd_status: 0x00000000]

I hope this could help you

... in one of my earlier posts, it was shown that running awk to count lines was actually slower than running a "grep-wc" combination ... try this one if it's any quicker ... sometimes speed is achieved just by using the tools at hand properly ...

#! /bin/ksh

E01=`grep -c 0x00000001 errores.log`
E0B=`grep -c 0x0000000B errores.log`
E45=`grep -c 0x00000045 errores.log`
E58=`grep -c 0x00000058 errores.log`
E64=`grep -c 0x00000064 errores.log`
E66=`grep -c 0x00000066 errores.log`
TOTAL=`expr $E01 + $E0B + $E45 + $E58 + $E64 + $E66`

echo "Errores 0x01 $E01" > class_total
echo "Errores 0x0B $E0B" >> class_total
echo "Errores 0x45 $E45" >> class_total
echo "Errores 0x58 $E58" >> class_total
echo "Errores 0x64 $E64" >> class_total
echo "Errores 0x66 $E66" >> class_total
echo "T O T A L $TOTAL" >> class_total

exit 0
awk -F"0x000000" '{a[$2]++}END{for (i in a) print substr(i,1,2),a}'

I tested this code with a 1638400 lines file. It took about 5~6 seconds.
With Just_Ice's script, it only took about 1~1.5 seconds.
In this case, there are 5 kinds of error code. More types of error code, "grep" method will take more time, but "AWK" method will not be, theoretically speaking.