Hi all,
I try to make a awk-script, which counts lines, summarized by pdf and xml.
So far it works, but for sorting reasons, I'd like to change the format from the field $1 from dd-mm-yyyy to yyyy-mm-dd.
This works find, but: split() and sprintf() prints its output (for no reason, the results looks like this:
22-09-2011 09:15:00 Doinggthings 49490388905_49490994.PDF
2011-09-22 09:15:00 Doinggthings 49490388905_49490994.PDF
22-09-2011 09:15:00 Doinggthings 49445688905_49499494.XML
2011-09-22 09:15:00 Doinggthings 49445688905_49499494.XML
23-09-2011 11:20:00 Doinggthings 49490312305_94689494.PDF
2011-09-23 11:20:00 Doinggthings 49490312305_94689494.PDF
23-09-2011 11:20:00 Doinggthings 49490388905_49378494.XML
2011-09-23 11:20:00 Doinggthings 49490388905_49378494.XML
Datum Total Files Total PDFs Total XMLs
2011-09-22 2 1 1
2011-09-23 2 1 1
Inputfile:
22-09-2011 09:15:00 Doinggthings 49490388905_49490994.PDF
22-09-2011 09:15:00 Doinggthings 49445688905_49499494.XML
23-09-2011 11:20:00 Doinggthings 49490312305_94689494.PDF
23-09-2011 11:20:00 Doinggthings 49490388905_49378494.XML
My actual script:
#!/usr/bin/nawk -f
BEGIN {
FS=" "
IGNORECASE = 1 }
split($1, d, "-")
$1 = sprintf("%s-%s-%s", d[3],d[2],d[1])
$1 != "" && $NF ~ /\.PDF/ {a[$1]++;b[$1]++}
$1 != "" && $NF ~ /\.XML/ {a[$1]++;c[$1]++}
END {
printf("%10s %12s %12s %12s\n", "Datum", "Total Files", "Total PDFs", "Total XMLs" )
for (i in a)
printf("%s %10.0f %10.0f %10.0f\n", i, a, b, c )}
#printf("%s %10.0f %10.0f %10.0f\n", i, a[i,t], b[i,p], c[i,x] )}
I have to use nawk instead of awk, because awk dosn't support split().
I run this on Solaris 10.
Does nawk behave differently?
Has anybody a good idea?
Thanks a lot.
Regis