Count of matched pattern occurences by minute and date in a log file

Anyone knows how to use AWK to achieve the following

Sun Feb 12 00:41:01-00:41:59 Success:2 Fail:2
Sun Feb 12 00:42:01-00:42:59 Success:1 Fail:2
Sun Feb 12 01:20:01-01:20:59 Success:1 Fail:2
Mon Feb 13 22:41:01-22:41:59 Success:1 Fail:1

log file:

[Sun Feb 12 00:41:17 2012] Success
[Sun Feb 12 00:41:19 2012] Success
[Sun Feb 12 00:41:50 2012] Fail
[Sun Feb 12 00:41:57 2012] Fail
[Sun Feb 12 00:42:03 2012] Fail
[Sun Feb 12 00:42:17 2012] Success
[Sun Feb 12 00:42:18 2012] Fail
[Sun Feb 12 01:20:10 2012] Fail
[Sun Feb 12 01:20:18 2012] Success
[Sun Feb 12 01:20:34 2012] Fail
[Mon Feb 13 22:41:13 2012] Success
[Mon Feb 13 22:41:18 2012] Fail

Have a go with this:

awk '
    {
        split( $4, a, ":" )
        base = sprintf( "%s %s %s %s:%s", substr( $1, 2 ), $2, $3, a[1], a[2] );
        if( !seen[base]++ )
        {
            order[++oidx] = base;
            time[base] = sprintf( "%s:%s", a[1], a[2] );
        }
        count[base " " $NF]++;
    }

    END {
        for( i = 1; i <= oidx; i++ )
            printf( "%s:00-%s:59 Success: %d  Fail: %d\n", order, time[order], count[order " Success"], count[order " Fail"] );
    }' input-file 

 

This preserves the input order which I assumed was desired.

1 Like

thanks mate!

Nice! What if i want it be calculated by hour instead of minute? which part i need to change?

Since the log entries are already in de right order we could do without arrays:

awk -F'[][ \t:]*' '
      function pr(){
        print p,"Success:"s,"Fail:"f
      }
      {
         n=$2" "$3" "$4" "$5":"$6":00-"$5":"$6":59"
      }
      n!=p{
         pr()
         s=f=x
         p=n
      }
      $NF=="Success"{
         s++
      }
      $NF=="Fail"{
         f++
      }
      END{
         pr()
      }' infile

For Hours use:

n=$2" "$3" "$4" "$5":00-"$5":59"

Your doesn't seem accurate though...

the output seems weird too.

Sat Feb 11 14:23:00-14:23:59 Success:4 Fail:2
Sat Feb 11 13:23:00-13:23:59 Success: Fail:
Sat Feb 11 14:23:00-14:23:59 Success:3 Fail:3
Sat Feb 11 13:23:00-13:23:59 Success: Fail:
Sat Feb 11 14:23:00-14:23:59 Success:1 Fail:2
Sat Feb 11 14:24:00-14:24:59 Success:2 Fail:2
Sat Feb 11 13:24:00-13:24:59 Success: Fail:
Sat Feb 11 14:24:00-14:24:59 Success:1 Fail:2
Sat Feb 11 13:24:00-13:24:59 Success: Fail:
Sat Feb 11 14:24:00-14:24:59 Success:6 Fail:2
Sat Feb 11 13:24:00-13:24:59 Success: Fail:
Sat Feb 11 14:24:00-14:24:59 Success:7 Fail:2
Sat Feb 11 13:24:00-13:24:59 Success: Fail:

---------- Post updated at 01:54 AM ---------- Previous update was at 01:51 AM ----------

Agama.. mind sharing which part i need to change in order to calculated by hour?

@timmywong: It will only work if the log file is in chronological order, which I thought might be the case based on your sample. What OS is this?

The code below has changes to group based on hours; grouping based on minutes are just commented out.

awk '
    {
        split( $4, a, ":" )
          #base = sprintf( "%s %s %s %s:%s", substr( $1, 2), $2, $3, a[1], a[2] ); # by minute
        base = sprintf( "%s %s %s %s", substr( $1, 2), $2, $3, a[1] );              # by hour
        if( !seen[base]++ )
        {
            order[++oidx] = base;
            #time[base] = sprintf( "%s:%s", a[1], a[2] );   # if by minute
            time[base] = sprintf( "%s", a[1] );             # if by hour
        }
        count[base " " $NF]++;
    }

    END {
        for( i = 1; i <= oidx; i++ )
            printf( "%s:00:00-%s:59:59 Success: %d  Fail: %d\n", order, time[order], count[order " Success"], count[order " Fail"] );
            #printf( "%s:00-%s:59 Success: %d  Fail: %d\n", order, time[order], count[order " Success"], count[order " Fail"] );
    }

' input-file
1 Like

works great Agama. Thanks

---------- Post updated at 09:51 AM ---------- Previous update was at 09:49 AM ----------

its centos 5.7. Meaning if it is not in order, it wont work?

That's right, otherwise one needs to use arrays, like in Agama's solution.