Report Generation with Grep

bharath.gct · April 1, 2010, 9:59pm

All,

I am pretty new to Unix Environment. I am not sure if my requirement can be accomplished in Unix. I did try searching this forum and others but could not get an answer. Requirement is explained below:

I have a set of files in a folder.

file1_unload
file2_unload
file3_unload
file1_load
file2_load
file3_load
file1_dat
file2_dat
file3_dat

And each file will have similar type of input. I will just give a files output for example:
file1_unload

Started at 22:24:32
593456 records unloaded
Ended at 23:08:40

file1_load

Started at 23:09:01
Loaded records 345298
Ended at 00:12:25

You can similarly imagine values for other unload and load files. And dat files will have the data in ASCII format. Now I want a report be created which looks like the following:

file1 <file1_dat's size> 593456 22:24:32 23:08:40 00:44:08 345298 23:09:01 00:12:25 01:03:24
file2 <file2_dat's size> <Unld> <UnldSt> <UnldEd> <timeDF> <Load> <LoadSt> <LoadEd> <timeDF>
file3 <file3_dat's size> <Unld> <UnldSt> <UnldEd> <timeDF> <Load> <LoadSt> <LoadEd> <timeDF>

For better understanding, the fields in the report are:

Part of the filename (file1, file2 etc.), NOT the entire filename.
Respective dat file's size.
Unloaded records from the respective Unload file.
Start Time from the respective Unload file.
End Time from the respective Unload file.
Time Difference between 4 and 5th columns.
Loaded records from the respective Load file.
Start Time from the respective Load file.
End Time from the respective Load file.
Time Difference between 8 and 9th columns.

Let me know if this can be done in UNIX. If so, please help me do it. Let me know for any questions on the inputs.

Thanks in advance,
Bharath

thegeek · April 1, 2010, 10:20pm

Yes you can do it, with the lot of utilities available.

But before that, what have you tried so far ?

bharath.gct · April 1, 2010, 10:34pm

Hi thegeek,

thanks for replying quickly. Here is what I am doing now. I have tried to do this, 1 column at a time, with Grep. Following are the Greps I used.

For columns 1 and 2:
ls -lrt

For Unload records: (Col 3)
grep -h " records " *_unload | awk '{print $1}'

For Unload Start and End Times: (Col 4, 5)
grep -h Started `ls -rt -1 *_unload`
grep -h Ended `ls -rt -1 *_unload`

No idea of how to get the Time Difference. (Col 6)

For Load records: (Col 7)
grep -h " records " *_load | awk '{print $3}'

For Load Start and End Times: (Col 8, 9)
grep -h Started `ls -rt -1 *_load`
grep -h Ended `ls -rt -1 *_load`

Again, No idea of how to get the Time Difference. (Col 10)

As I said in my previous post I am new to Unix and not sure of how to use the utilities, that you say are there! Pardon my ignorance.

Let me know if you need any other inputs.

Thanks
Bharath

thegeek · April 2, 2010, 2:05am

a solution using a scripting language would be a better one.. do you know any one ? then try that..

bharath.gct · April 2, 2010, 4:12pm

The only idea I have is to write a loop getting the filenames and using the above grep statements for each file and echo the results.

Is there any better and effecient way of doing this?

Thanks

frans · April 2, 2010, 4:38pm

If the xxx_load and xxx_unload files have always the same structure, i think it would be better to read them sequentially.
I believe the best way is to write a script with a for loop like

for FILE in $(ls file*_dat)
do
  {
    read X X UnldSt
    read Unld X
    read X X UnldEd
  } < ${FILE%_dat}_unload
  {
    read X X LoadSt
    read X X Load
    read X X LoadEd
  } < ${FILE%_dat}_load
  # Calculate the time differences given in seconds
  UnldDT=$(( $(date -d UnldEd +%s) - $(date -d UnldSt +%s) ))   # Works if you have GNU date
  ((UnldDT<0)) && ((UnldDT+=86400))
  LoadDT=$(( $(date -d LoadEd +%s) - $(date -d LoadSt +%s) ))
  ((LoadDT<0)) && ((LoadDT+=86400))
  echo "$(ls -lrt $FILE) $Unld $UnldSt $UnldEd $UnldDT $Load $LoadSt $LoadEd $LoadDT"
done

bharath.gct · April 2, 2010, 9:20pm

Thanks Frans for the quick response.

I think I dont have GNU date. I am working on KSH on an IBM AIX OS. Below is the error that I got.

date: Not a recognized flag: d
Usage: date [-u] [+"Field Descriptors"]
date: Not a recognized flag: d
Usage: date [-u] [+"Field Descriptors"]

Let me know if I am missing something.

Thanks,
Bharath

---------- Post updated at 09:20 PM ---------- Previous update was at 05:34 PM ----------

Ok finally found a way!! Wrote a simple awk function to do that (only time difference within a day) Atleast thats what I needed:

Function:

TimeDiff ()
{
	echo `echo "$1" | awk 'BEGIN {
   	split("3600 60 1", sec_calc)
   	FS=":|[ ][ ]*"
	}	
	{
	  time1 = 0
	  for (i=1; i < 4; i++)
	    time1 += $i * sec_calc
	  
	  time2 = 0
	    for (i=4; i < 7; i++)
	    time2 += $i * sec_calc[i-3]
	
	  time_diff = time2 - time1
	  hour_diff = time_diff / 3600 
	  calc_diff = time_diff % 3600 
	  min_diff = calc_diff / 60
	  sec_diff = calc_diff % 60
	  
	  printf("%.2d:%.2d:%.2d\n",int(hour_diff),int(min_diff),sec_diff) 
	  
	}'`
}

And called the function in my script as:

UnldDT=`TimeDiff "$UnldSt $UnldEd"`
LoadDT=`TimeDiff "$LoadSt $LoadEd"`

Thanks all for your time.
Bharath

durden_tyler · April 4, 2010, 4:46pm

Here's a Perl solution.

$ 
$ # show file sizes and names
$ ls -l file* | awk '{printf("%8d %s\n", $5, $8)}'
   10100 file1_dat
      60 file1_load
      62 file1_unload
   24930 file2_dat
      61 file2_load
      62 file2_unload
$ 
$ # display contents of the files - "file1_load" and "file1_unload"
$ cat file1_load
Started at 23:09:01
Loaded records 345298
Ended at 00:12:25
$ 
$ cat file1_unload
Started at 22:24:32
593456 records unloaded
Ended at 23:08:40
$ 
$ # display contents of the files - "file2_load" and "file2_unload"
$ cat file2_load
Started at 23:47:55
Loaded records 6748392
Ended at 01:23:49
$ 
$ cat file2_unload
Started at 21:56:38
475839 records unloaded
Ended at 23:00:59
$ 
$ 
$ # show the Perl script
$ cat -n report.pl
     1  #!/usr/bin/perl
     2  use Date::Calc qw(This_Year Delta_DHMS);
     3
     4  foreach $i (glob "*_dat") {
     5    ($f = $i) =~ s/_dat$//;
     6    @s = stat($i);
     7
     8    ## Unload
     9    $unload_file = $f."_unload";
    10    open(F,$unload_file) or die "Can't open $unload_file: $!";
    11    chomp(@unload = <F>);
    12    close(F) or die "Can't close $unload_file: $!";
    13    ($unloadcount = $unload[1]) =~ s/^(\d+) records unloaded$/$1/;
    14    ($usttime = $unload[0]) =~ s/^Started at (.*?)$/$1/;
    15    @x = split(/:/, $usttime);
    16    ($uedtime = $unload[2]) =~ s/^Ended at (.*?)$/$1/;
    17    @y = split(/:/, $uedtime);
    18    ($Dd,$Dh,$Dm,$Ds) = Delta_DHMS(This_Year(),1,1, @x, This_Year(),1,2, @y);
    19    $uelapsed = sprintf("%02d:%02d:%02d",$Dh,$Dm,$Ds);
    20
    21    ## Load
    22    $load_file = $f."_load";
    23    open(F,$load_file) or die "Can't open $load_file: $!";
    24    chomp(@load = <F>);
    25    close(F) or die "Can't close $load_file: $!";
    26    ($loadcount = $load[1]) =~ s/^Loaded records (\d+)$/$1/;
    27    ($lsttime = $load[0]) =~ s/^Started at (.*?)$/$1/;
    28    @x = split(/:/, $lsttime);
    29    ($ledtime = $load[2]) =~ s/^Ended at (.*?)$/$1/;
    30    @y = split(/:/, $ledtime);
    31    ($Dd,$Dh,$Dm,$Ds) = Delta_DHMS(This_Year(),1,1, @x, This_Year(),1,2, @y);
    32    $lelapsed = sprintf("%02d:%02d:%02d",$Dh,$Dm,$Ds);
    33
    34    ## print report line
    35    printf("%-8s %10s %10s %10s %10s %10s %10s %10s %10s %10s\n",
    36            $f, $s[7], $unloadcount, $usttime, $uedtime, $uelapsed, $loadcount, $lsttime, $ledtime, $lelapsed);
    37  }
    38
$ 
$ 
$ # run the Perl script
$ perl report.pl
file1         10100     593456   22:24:32   23:08:40   00:44:08     345298   23:09:01   00:12:25   01:03:24
file2         24930     475839   21:56:38   23:00:59   01:04:21    6748392   23:47:55   01:23:49   01:35:54
$ 
$

tyler_durden