NAWK: changing string-format with split

regisl67 · September 27, 2011, 11:13am

Hi all,
I try to make a awk-script, which counts lines, summarized by pdf and xml.
So far it works, but for sorting reasons, I'd like to change the format from the field $1 from dd-mm-yyyy to yyyy-mm-dd.

This works find, but: split() and sprintf() prints its output (for no reason, the results looks like this:

22-09-2011 09:15:00 Doinggthings 49490388905_49490994.PDF
2011-09-22 09:15:00 Doinggthings 49490388905_49490994.PDF
22-09-2011 09:15:00 Doinggthings 49445688905_49499494.XML
2011-09-22 09:15:00 Doinggthings 49445688905_49499494.XML
23-09-2011 11:20:00 Doinggthings 49490312305_94689494.PDF
2011-09-23 11:20:00 Doinggthings 49490312305_94689494.PDF
23-09-2011 11:20:00 Doinggthings 49490388905_49378494.XML
2011-09-23 11:20:00 Doinggthings 49490388905_49378494.XML
     Datum  Total Files   Total PDFs   Total XMLs
2011-09-22          2          1          1
2011-09-23          2          1          1

Inputfile:

22-09-2011 09:15:00 Doinggthings 49490388905_49490994.PDF
22-09-2011 09:15:00 Doinggthings 49445688905_49499494.XML
23-09-2011 11:20:00 Doinggthings 49490312305_94689494.PDF
23-09-2011 11:20:00 Doinggthings 49490388905_49378494.XML

My actual script:

#!/usr/bin/nawk -f

BEGIN {
        FS=" "
        IGNORECASE = 1 }

split($1, d, "-")
$1 = sprintf("%s-%s-%s", d[3],d[2],d[1])

$1 != "" && $NF ~ /\.PDF/ {a[$1]++;b[$1]++}
$1 != "" && $NF ~ /\.XML/ {a[$1]++;c[$1]++}

END {
        printf("%10s %12s %12s %12s\n", "Datum", "Total Files", "Total PDFs", "Total XMLs" )
        for (i in a)
                printf("%s %10.0f %10.0f %10.0f\n", i, a, b, c )}
                #printf("%s %10.0f %10.0f %10.0f\n", i, a[i,t], b[i,p], c[i,x] )}

I have to use nawk instead of awk, because awk dosn't support split().
I run this on Solaris 10.
Does nawk behave differently?

Has anybody a good idea?

Thanks a lot.
Regis

Chubler_XL · September 27, 2011, 8:51pm

You need to put your split code in a {} pair:

#!/usr/bin/nawk -f
 
BEGIN {
        FS=" "
        IGNORECASE = 1 }
{
   split($1, d, "-")
   $1 = sprintf("%s-%s-%s", d[3],d[2],d[1])
}
 
$1 != "" && $NF ~ /\.PDF/ {a[$1]++;b[$1]++}
$1 != "" && $NF ~ /\.XML/ {a[$1]++;c[$1]++}
 
END {
        printf("%10s %12s %12s %12s\n", "Datum", "Total Files", "Total PDFs", "Total XMLs" )
        for (i in a)
                printf("%s %10.0f %10.0f %10.0f\n", i, a, b, c )}
                #printf("%s %10.0f %10.0f %10.0f\n", i, a[i,t], b[i,p], c[i,x] )}

regisl67 · September 28, 2011, 5:13am

Uh?! It was so simple ... and works perfect. Thank you very much for your help! R�gis