and when i run it i get a 'class_total' file like:
Errores 0x01 188
Errores 0x0B 127
Errores 0x45 0
Errores 0x58 0
Errores 0x64 38
Errores 0x66 140
T O T A L 493
but the file 'errores.log' that i im counting have 5'000.000 of lines and this process take a few seconds, how i do the same in awk to improve the time of response?
vgersh99's solution is good, but it might be possible to refine it a bit. Assuming a nawk/gawk (likewise untested):
k = match($0,/0x000000(01|0B|45|58|64|66)/) {
arr[substr($0, k + 8, 2)]++
}
END {
for (i in arr)
print "Errores", "0x" i, arr
}
What are the differences?
Direct match against all the patterns you seek at one time, i.e., no iteration over codesA on every line
No lookup in codesAs for each item found
[Old awk solution deleted --- no alternation ("|") available.]
The disadvantage here is that nawk and awk don't "capture" the thing matched, so you still do the substr call on every match (though it's in machine language and therefore fast). (I don't use gawk, but it may have a remedy for this.)
A ruby solution would work like this (and the perl solution would be similar):
$hsh = { "01" => 0, "0B" => 0, "45" => 0, "58" => 0, "64" => 0, "66" => 0 }
ARGF.each do |line|
next unless m = %r/0x000000(01|0B|45|58|64|66)/.match line
$hsh[m[1]] += 1
end
$hsh.keys.sort do |k|
print "Errores 0x", k, " ", $hsh[k]
end
Thanx criglerj your solutions works too, but now i have a new problem, the file borrar.log have 2537051 lines and this process is taking 30 seconds, and i need this information for bein showed each 5 seconds, maximus 10... how can i do it better?
If the data you're looking for always shows up in the same awk field, e.g., $4, and it's the only thing in that field, then you can speed it up as r2007 suggested, by only checking that field and by using the whole field as the index of arr, then deferring the substring operation to the END block:
$4 ~ /^0x000000(01|0B|45|58|64|66)$/ {
arr[$4]++
}
END {
for (i in arr)
print "Errores", "0x" substr(i,9,2), arr
}
My next line of attack would be ruby or perl. Ruby is easier to read and write, but it works by interpreting the AST at runtime. Perl runs faster because it compiles to bytecode. And I believe perl is installed by default on Solaris 8 (usually an old version, though sysadmins frequently install an updated version). Anyhow, a perl version would look like this:
while (<>) {
next unless /0x000000(01|0B|45|58|64|66)/;
$a{$1}++;
}
while (($k, $v) = each %a) {
print "Errores 0x", $k, " ", $v, "\n"
}
... in one of my earlier posts, it was shown that running awk to count lines was actually slower than running a "grep-wc" combination ... try this one if it's any quicker ... sometimes speed is achieved just by using the tools at hand properly ...
awk -F"0x000000" '{a[$2]++}END{for (i in a) print substr(i,1,2),a}'
I tested this code with a 1638400 lines file. It took about 5~6 seconds.
With Just_Ice's script, it only took about 1~1.5 seconds.
In this case, there are 5 kinds of error code. More types of error code, "grep" method will take more time, but "AWK" method will not be, theoretically speaking.