Convert a script in awk script

Lestat · June 8, 2005, 12:56pm

Hello guys,

I have a script like:

and when i run it i get a 'class_total' file like:

      Errores 0x01     188 
      Errores 0x0B     127 
      Errores 0x45       0 
      Errores 0x58       0 
      Errores 0x64      38 
      Errores 0x66     140 
         T O T A L     493

but the file 'errores.log' that i im counting have 5'000.000 of lines and this process take a few seconds, how i do the same in awk to improve the time of response?

vgersh99 · June 8, 2005, 1:37pm

nawk -f lest.awk errors.log

here's lest.awk - not tested

BEGIN {
  codesN=split("0x00000001 0x0000000B 0x00000045 0x00000058 0x00000064 0x00000066", codesA, " ");
  split("0x01 0x0B 0x45 0x58 0x64 0x66", codesAs, " ");
}

{
   for(i=1; i <= codesN; i++)
     if ( $0 ~ codesA )
        arr[codesAs]++
}

END {
  for( i in arr ) {
    printf("Errores %s %d\n", i, arr)
    tot+=arr
  }
  printf("TOTAL %d\n", tot)
}

Lestat · June 9, 2005, 12:55pm

I don't know too much of awk, i just know that awk is powerfull as another, when i run the following script u have this problem:

awk: syntax error near line 1
awk: bailing out near line 1

vgersh99:

nawk -f lest.awk errors.log

BEGIN {
  codesN=split("0x00000001 0x0000000B 0x00000045 0x00000058 0x00000064 0x00000066", codesA, " ");
  split("0x01 0x0B 0x45 0x58 0x64 0x66", codesAs, " ");
}

{
   for(i=1; i <= codesN; i++)
   if ( $0 ~ codesA )
   arr[codesAs]++
}

END {
  for( i in arr ) {
   printf("Errores %s %d\n", i, arr)
   tot+=arr
  }
  printf("TOTAL %d\n", tot)
}

what can i do?

vgersh99 · June 9, 2005, 12:59pm

use nawk instead of awk.

Lestat · June 9, 2005, 1:18pm

Not yet, i still have problems:

example# nawk prueba.awk borrar

nawk: syntax error at source line 1
context is
>>> prueba. <<< awk
nawk: bailing out at source line 1

criglerj · June 9, 2005, 1:19pm

vgersh99's solution is good, but it might be possible to refine it a bit. Assuming a nawk/gawk (likewise untested):

k = match($0,/0x000000(01|0B|45|58|64|66)/) {
    arr[substr($0, k + 8, 2)]++
}

END {
    for (i in arr)
        print "Errores", "0x" i, arr
}

What are the differences?

Direct match against all the patterns you seek at one time, i.e., no iteration over codesA on every line
No lookup in codesAs for each item found

[Old awk solution deleted --- no alternation ("|") available.]

The disadvantage here is that nawk and awk don't "capture" the thing matched, so you still do the substr call on every match (though it's in machine language and therefore fast). (I don't use gawk, but it may have a remedy for this.)

A ruby solution would work like this (and the perl solution would be similar):

$hsh = { "01" => 0, "0B" => 0, "45" => 0, "58" => 0, "64" => 0, "66" => 0 }
ARGF.each do |line|
    next unless m = %r/0x000000(01|0B|45|58|64|66)/.match line
    $hsh[m[1]] += 1
end
$hsh.keys.sort do |k|
    print "Errores 0x", k, " ", $hsh[k]
end

vgersh99 · June 9, 2005, 1:20pm

pay more attention to the postings.......
nawk -f prueba.awk borrar

criglerj · June 9, 2005, 1:21pm

nawk -f prueba.awk borrar

Lestat · June 9, 2005, 1:22pm

i got it....

nawk -f prueba.awk borrar

Thanx my friend vgersh99!!!!!!!!

Lestat · June 9, 2005, 4:23pm

Thanx criglerj your solutions works too, but now i have a new problem, the file borrar.log have 2537051 lines and this process is taking 30 seconds, and i need this information for bein showed each 5 seconds, maximus 10... how can i do it better?

r2007 · June 9, 2005, 10:30pm

Post a sample data file please. If the "0x000000??" appears in a foreseeable position, I think the "substr" procedure can be done in the END{} part.

criglerj · June 10, 2005, 12:30pm

If the data you're looking for always shows up in the same awk field, e.g., $4, and it's the only thing in that field, then you can speed it up as r2007 suggested, by only checking that field and by using the whole field as the index of arr, then deferring the substring operation to the END block:

$4 ~ /^0x000000(01|0B|45|58|64|66)$/ {
    arr[$4]++
}

END {
    for (i in arr)
        print "Errores", "0x" substr(i,9,2), arr
}

My next line of attack would be ruby or perl. Ruby is easier to read and write, but it works by interpreting the AST at runtime. Perl runs faster because it compiles to bytecode. And I believe perl is installed by default on Solaris 8 (usually an old version, though sysadmins frequently install an updated version). Anyhow, a perl version would look like this:

while (<>) {
    next unless /0x000000(01|0B|45|58|64|66)/;
    $a{$1}++;
}
while (($k, $v) = each %a) {
    print "Errores 0x", $k, " ", $v, "\n"
}

Lestat · June 10, 2005, 1:31pm

Part of my 'borrar' file:

[2005-06-10 07:28:11]{12}: [R-> PBRX1] DELIVER_SM_RESP [seqno: 79774][trans_id: 23914495 ][cmd_status: 0x00000064]
[2005-06-10 07:28:11]{13}: [R-> PBRX2] DELIVER_SM_RESP [seqno: 79775][trans_id: 23914496 ][cmd_status: 0x00000001]
[2005-06-10 07:28:11]{7}: [R-> PBRX2] DELIVER_SM_RESP [seqno: 79777][trans_id: 23914498 ][cmd_status: 0x00000066]
[2005-06-10 07:28:11]{12}: [R-> PBRX2] DELIVER_SM_RESP [seqno: 79776][trans_id: 23914497 ][cmd_status: 0x00000045]
[2005-06-10 07:28:12]{8}: [R-> PBRX1] DELIVER_SM_RESP [seqno: 79778][trans_id: 23914499 ][cmd_status: 0x00000000]

I hope this could help you

Just_Ice · June 10, 2005, 2:17pm

... in one of my earlier posts, it was shown that running awk to count lines was actually slower than running a "grep-wc" combination ... try this one if it's any quicker ... sometimes speed is achieved just by using the tools at hand properly ...

#! /bin/ksh

E01=`grep -c 0x00000001 errores.log`
E0B=`grep -c 0x0000000B errores.log`
E45=`grep -c 0x00000045 errores.log`
E58=`grep -c 0x00000058 errores.log`
E64=`grep -c 0x00000064 errores.log`
E66=`grep -c 0x00000066 errores.log`
TOTAL=`expr $E01 + $E0B + $E45 + $E58 + $E64 + $E66`

echo "Errores 0x01 $E01" > class_total
echo "Errores 0x0B $E0B" >> class_total
echo "Errores 0x45 $E45" >> class_total
echo "Errores 0x58 $E58" >> class_total
echo "Errores 0x64 $E64" >> class_total
echo "Errores 0x66 $E66" >> class_total
echo "T O T A L $TOTAL" >> class_total

exit 0

r2007 · June 10, 2005, 10:52pm

awk -F"0x000000" '{a[$2]++}END{for (i in a) print substr(i,1,2),a}'

I tested this code with a 1638400 lines file. It took about 5~6 seconds.
With Just_Ice's script, it only took about 1~1.5 seconds.
In this case, there are 5 kinds of error code. More types of error code, "grep" method will take more time, but "AWK" method will not be, theoretically speaking.