Need help with a shell script

I am a beginner in scripting and need some advise .

I have a log file which has following

2009-11-30 01:33:00,710 Beginning a.gzip
2009-11-30 01:33:56,704 Completed a.gzip
2009-11-30 01:33:56,704 Beginning b.gzip
2009-11-30 01:34:45,828 Completed b.gzip
2009-11-30 02:20:00,018 Beginning c.gzip
2009-11-30 02:22:46,349 Finished c.gzip

Now I have a directory which has all these files as well and if you do "zcat a.zip|wc -l you will find the number of rows" .

I need to plot no of rows on y axis and time taken to process and if possible with hourly range.on x axis.

Wondering if I can automate this with a shell script and the output would be name value pair like follwing so it is easier to plot or some other ide you have

transactions =2323 ,time taken =5minutes
transactions =2123 ,time taken =5 minute 
transactions =3323 ,time taken =6 minute 
transactions =4323 ,time taken =7 minute

It may be a good idea to use a scripting language like Perl or Python if the date arithmetic is complex (e.g. change of dates/months/years etc.)

Given below is a solution in Perl. CPAN is the repository of all Perl modules and Date::Calc is a small and elegant module for performing blazingly fast date arithmetic.

$                                                                            
$                                                                            
$ # show the contents of the log file                                        
$ cat test.log                                                               
2009-11-30 01:33:00,710 Beginning a.zip                                      
2009-11-30 01:33:56,704 Completed a.zip                                      
2009-11-30 01:33:56,704 Beginning b.zip                                      
2009-11-30 01:34:45,828 Completed b.zip                                      
2009-11-30 02:20:00,018 Beginning c.zip                                      
2009-11-30 02:22:46,349 Completed c.zip                                      
2009-12-31 23:58:19,518 Beginning d.zip                                      
2010-01-01 00:19:58,899 Completed d.zip                                      
2010-01-01 11:51:23,790 Beginning e.zip                                      
2010-01-01 13:05:09,791 Completed e.zip                                      
$                                                                            
$ # all zip files are in the "zipfiles" directory
$ # have a peek at the "zipfiles" directory      
$ find zipfiles -type f -name "*.zip"
zipfiles/c.zip                       
zipfiles/b.zip                       
zipfiles/e.zip                       
zipfiles/a.zip                       
zipfiles/d.zip                       
$                                    
$ # now show the contents of the Perl program
$ # the logic should be pretty obvious; a few inline script comments have been thrown in
$                                                                                     
$ cat -n processzip.pl
     1  #!/usr/bin/perl -w
     2  use Date::Calc qw(Delta_DHMS);
     3  $logfile = "test.log";
     4  $zipdir = "zipfiles";
     5  open (LF, $logfile) or die "Can't open $logfile: $!";
     6  while (<LF>) {
     7    if (/((\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})),\d+ Beginning (.*)$/) {
     8      # set the START date and time components, and the zip file name
     9      $y1=$2; $mon1=$3; $d1=$4;
    10      $h1=$5; $min1=$6; $s1=$7;
    11      $zfile1 = $8;
    12    } elsif (/((\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})),\d+ Completed (.*)$/) {
    13      # set the END date and time components, and the zip file name
    14      $y2=$2; $mon2=$3; $d2=$4;
    15      $h2=$5; $min2=$6; $s2=$7;
    16      $zfile2 = $8;
    17      # process further only if we found a pair of records for the same zip file
    18      if ($zfile1 eq $zfile2) {
    19        # find out the number of transactions
    20        chomp($num = `zcat $zipdir/$zfile1 | wc -l`);
    21        # find out the processing time
    22        ($d, $h, $m, $s) = Delta_DHMS($y1, $mon1, $d1, $h1, $min1, $s1,
    23                                      $y2, $mon2, $d2, $h2, $min2, $s2);
    24        # now print this information
    25        printf("file  = %10s, transactions =%10d, time taken =%3d days %2d hours %2d minutes %2d seconds\n",
    26               $zfile1,$num, $d, $h, $m, $s);
    27      }
    28    }
    29  }
    30  close (LF) or die "Can't close $logfile: $!";
    31
$
$ # now execute the Perl program
$
$ perl processzip.pl
file  =      a.zip, transactions =      8613, time taken =  0 days  0 hours  0 minutes 56 seconds
file  =      b.zip, transactions =       396, time taken =  0 days  0 hours  0 minutes 49 seconds
file  =      c.zip, transactions =      8591, time taken =  0 days  0 hours  2 minutes 46 seconds
file  =      d.zip, transactions =      3836, time taken =  0 days  0 hours 21 minutes 39 seconds
file  =      e.zip, transactions =     72067, time taken =  0 days  1 hours 13 minutes 46 seconds
$
$

HTH,
tyler_durden

Thanks . I was hoping this in bash as the prod machines have the files and have no access for any modules or cpan or YUM

Hi, if you have gnu date on your system you can do something like this:

mins_diff()
{
  echo $((($(date -d "$2" +%s)-$(date -d "$1" +%s))/60))
}

while read date time phase name; do
  case $phase in
    Beginning) start="$date $time" ;;
    Completed|Finished)
               end="$date $time"
               printf "transactions = $(zcat "$name"|wc -l) ,"
               printf "time taken = $( mins_diff "$start" "$end" ) minutes\n" ;;
  esac
done < logfile

Thanks for the tips both are working but need a little help as this is my first perl program

a) In the perl script instead of specifying test.log how can we change itto go to say /logs directory and pick up all server.log.<date> files instead ?

b) Can you please explain line 7,9,10,11?

     7    if (/((\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})),\d+ Beginning (.*)$/) {
     8      # set the START date and time components, and the zip file name
     9      $y1=$2; $mon1=$3; $d1=$4;
    10      $h1=$5; $min1=$6; $s1=$7;
    11      $zfile1 = $8;

You are always welcome..
And good that you have started to try to understand the program.

Perl is TMTOWTDI language. Let me show you one way for achieving that.

$ cat t.pl 
foreach ( `/bin/ls /logs/test.log*` )
{
    print "file: $_";
}

It prints the following output:

$ perl t.pl
file: /logs/test.log
file: /logs/test.log.010110
file: /logs/test.log.311209

Line 7: That is a regular expression,
() - capture it in variables ( $1, $2, $3... )
\d - match a digit
{4} - match previous item 4 times, ( i.e 4 digits here )

Line 8: assign the captured things to variables,

$1 is the value captured in first set of parenthesis, $2 -- second set of parenthesis value .. and so on.

Hope you understand. Also there are a lot of places where you can learn perl. Perl is really very simple to learn and understand. Just google and learn.

Thanks folks. I promise to read REGEX and more Perl stuff and contribute as I learn more to this awesome forum. I have one last question to complete my other script .

Per the example and explanation the below code works fine

2009-11-30 01:33:00,710 Beginning a.zip
(/((\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})),\d+ Beginning (.*)$/)

Then what modification do I need if the logfile is say

2009-11-20 00:25:23,481 DEBUG  [com.blah.blah] Beginning file loading :file1_complete.zip
 will the following work ?  
(/((\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})),\d+ Beginning file loading(.*)$/)

No it will not work.

You have to match

DEBUG  [com.blah.blah]

So place a .* to match that, if you want to capture use parenthesis as (.*), read, experiment, understand, read, test...

$_ = '2009-11-20 00:25:23,481 DEBUG  [com.blah.blah] Beginning file loading :file1_complete.zip';

if ( (/((\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})),\d+ .* Beginning file loading(.*)$/) )  {
    print $_;
}