Need help with a shell script

gubbu · December 17, 2009, 2:37pm

I am a beginner in scripting and need some advise .

I have a log file which has following

2009-11-30 01:33:00,710 Beginning a.gzip
2009-11-30 01:33:56,704 Completed a.gzip
2009-11-30 01:33:56,704 Beginning b.gzip
2009-11-30 01:34:45,828 Completed b.gzip
2009-11-30 02:20:00,018 Beginning c.gzip
2009-11-30 02:22:46,349 Finished c.gzip

Now I have a directory which has all these files as well and if you do "zcat a.zip|wc -l you will find the number of rows" .

I need to plot no of rows on y axis and time taken to process and if possible with hourly range.on x axis.

Wondering if I can automate this with a shell script and the output would be name value pair like follwing so it is easier to plot or some other ide you have

transactions =2323 ,time taken =5minutes
transactions =2123 ,time taken =5 minute 
transactions =3323 ,time taken =6 minute 
transactions =4323 ,time taken =7 minute

durden_tyler · December 17, 2009, 5:04pm

It may be a good idea to use a scripting language like Perl or Python if the date arithmetic is complex (e.g. change of dates/months/years etc.)

Given below is a solution in Perl. CPAN is the repository of all Perl modules and Date::Calc is a small and elegant module for performing blazingly fast date arithmetic.

$                                                                            
$                                                                            
$ # show the contents of the log file                                        
$ cat test.log                                                               
2009-11-30 01:33:00,710 Beginning a.zip                                      
2009-11-30 01:33:56,704 Completed a.zip                                      
2009-11-30 01:33:56,704 Beginning b.zip                                      
2009-11-30 01:34:45,828 Completed b.zip                                      
2009-11-30 02:20:00,018 Beginning c.zip                                      
2009-11-30 02:22:46,349 Completed c.zip                                      
2009-12-31 23:58:19,518 Beginning d.zip                                      
2010-01-01 00:19:58,899 Completed d.zip                                      
2010-01-01 11:51:23,790 Beginning e.zip                                      
2010-01-01 13:05:09,791 Completed e.zip                                      
$                                                                            
$ # all zip files are in the "zipfiles" directory
$ # have a peek at the "zipfiles" directory      
$ find zipfiles -type f -name "*.zip"
zipfiles/c.zip                       
zipfiles/b.zip                       
zipfiles/e.zip                       
zipfiles/a.zip                       
zipfiles/d.zip                       
$                                    
$ # now show the contents of the Perl program
$ # the logic should be pretty obvious; a few inline script comments have been thrown in
$                                                                                     
$ cat -n processzip.pl
     1  #!/usr/bin/perl -w
     2  use Date::Calc qw(Delta_DHMS);
     3  $logfile = "test.log";
     4  $zipdir = "zipfiles";
     5  open (LF, $logfile) or die "Can't open $logfile: $!";
     6  while (<LF>) {
     7    if (/((\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})),\d+ Beginning (.*)$/) {
     8      # set the START date and time components, and the zip file name
     9      $y1=$2; $mon1=$3; $d1=$4;
    10      $h1=$5; $min1=$6; $s1=$7;
    11      $zfile1 = $8;
    12    } elsif (/((\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})),\d+ Completed (.*)$/) {
    13      # set the END date and time components, and the zip file name
    14      $y2=$2; $mon2=$3; $d2=$4;
    15      $h2=$5; $min2=$6; $s2=$7;
    16      $zfile2 = $8;
    17      # process further only if we found a pair of records for the same zip file
    18      if ($zfile1 eq $zfile2) {
    19        # find out the number of transactions
    20        chomp($num = `zcat $zipdir/$zfile1 | wc -l`);
    21        # find out the processing time
    22        ($d, $h, $m, $s) = Delta_DHMS($y1, $mon1, $d1, $h1, $min1, $s1,
    23                                      $y2, $mon2, $d2, $h2, $min2, $s2);
    24        # now print this information
    25        printf("file  = %10s, transactions =%10d, time taken =%3d days %2d hours %2d minutes %2d seconds\n",
    26               $zfile1,$num, $d, $h, $m, $s);
    27      }
    28    }
    29  }
    30  close (LF) or die "Can't close $logfile: $!";
    31
$
$ # now execute the Perl program
$
$ perl processzip.pl
file  =      a.zip, transactions =      8613, time taken =  0 days  0 hours  0 minutes 56 seconds
file  =      b.zip, transactions =       396, time taken =  0 days  0 hours  0 minutes 49 seconds
file  =      c.zip, transactions =      8591, time taken =  0 days  0 hours  2 minutes 46 seconds
file  =      d.zip, transactions =      3836, time taken =  0 days  0 hours 21 minutes 39 seconds
file  =      e.zip, transactions =     72067, time taken =  0 days  1 hours 13 minutes 46 seconds
$
$

HTH,
tyler_durden

gubbu · December 28, 2009, 12:59am

Thanks . I was hoping this in bash as the prod machines have the files and have no access for any modules or cpan or YUM

Scrutinizer · December 28, 2009, 1:41am

Hi, if you have gnu date on your system you can do something like this:

mins_diff()
{
  echo $((($(date -d "$2" +%s)-$(date -d "$1" +%s))/60))
}

while read date time phase name; do
  case $phase in
    Beginning) start="$date $time" ;;
    Completed|Finished)
               end="$date $time"
               printf "transactions = $(zcat "$name"|wc -l) ,"
               printf "time taken = $( mins_diff "$start" "$end" ) minutes\n" ;;
  esac
done < logfile

gubbu · January 2, 2010, 2:32am

Thanks for the tips both are working but need a little help as this is my first perl program

a) In the perl script instead of specifying test.log how can we change itto go to say /logs directory and pick up all server.log.<date> files instead ?

b) Can you please explain line 7,9,10,11?

     7    if (/((\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})),\d+ Beginning (.*)$/) {
     8      # set the START date and time components, and the zip file name
     9      $y1=$2; $mon1=$3; $d1=$4;
    10      $h1=$5; $min1=$6; $s1=$7;
    11      $zfile1 = $8;

thegeek · January 2, 2010, 10:19am

You are always welcome..
And good that you have started to try to understand the program.

Perl is TMTOWTDI language. Let me show you one way for achieving that.

$ cat t.pl 
foreach ( `/bin/ls /logs/test.log*` )
{
    print "file: $_";
}

It prints the following output:

$ perl t.pl
file: /logs/test.log
file: /logs/test.log.010110
file: /logs/test.log.311209

Line 7: That is a regular expression,
() - capture it in variables ( $1, $2, $3... )
\d - match a digit
{4} - match previous item 4 times, ( i.e 4 digits here )

Line 8: assign the captured things to variables,

$1 is the value captured in first set of parenthesis, $2 -- second set of parenthesis value .. and so on.

Hope you understand. Also there are a lot of places where you can learn perl. Perl is really very simple to learn and understand. Just google and learn.

gubbu · January 3, 2010, 1:14am

Thanks folks. I promise to read REGEX and more Perl stuff and contribute as I learn more to this awesome forum. I have one last question to complete my other script .

Per the example and explanation the below code works fine

2009-11-30 01:33:00,710 Beginning a.zip
(/((\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})),\d+ Beginning (.*)$/)

Then what modification do I need if the logfile is say

2009-11-20 00:25:23,481 DEBUG  [com.blah.blah] Beginning file loading :file1_complete.zip
 will the following work ?  
(/((\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})),\d+ Beginning file loading(.*)$/)

thegeek · January 3, 2010, 1:44am

gubbu:

Then what modification do I need if the logfile is say

2009-11-20 00:25:23,481 DEBUG  [com.blah.blah] Beginning file loading :file1_complete.zip
 will the following work ?  
(/((\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})),\d+ Beginning file loading(.*)$/)

No it will not work.

You have to match

DEBUG  [com.blah.blah]

So place a .* to match that, if you want to capture use parenthesis as (.*), read, experiment, understand, read, test...

$_ = '2009-11-20 00:25:23,481 DEBUG  [com.blah.blah] Beginning file loading :file1_complete.zip';

if ( (/((\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})),\d+ .* Beginning file loading(.*)$/) )  {
    print $_;
}