[Help] PERL Script - grep multiple lines

miskin · December 12, 2008, 11:31am

Hi Gurus,

I need some help with the "grep" command or whatever command that you think suitable for me. I'm about to write a perl script to extract a report from the system and submit it to the end users. The input for the script will consist of 3 element.

1) Generation ID
2) Month
3) Year

I have a text file that contain the following data :

Its actually quite a long data (from 2001 to 2008) but the format are the same

I need to get those 3 element for the input in my perl script. The main element is Generation <ID#>. Let say from the sample data above, i want the data with the date of Jan 2001 to let say Dec 2001, in order to get the Generation ID#, i need to search for all the JAN to DEC 2001's Generation ID#.

So the question is, how can i do that ?

Thanks,

otheus · December 12, 2008, 1:15pm

#!/usr/bin/perl -w
my @data;
while (<>) { 
  push @data, $1 if /^\s*Generation (\d+)/;
  push @data, ($1, $2) if /imported at \w+ (\w+) .* (\d{4})\s*$/;
}

Now your data exists in the @data array. Every 3rd element is a new record. To extract, you can use, well, lots of things:

while ($#data >= 0) { 
   printf "%s\t%s\t%s\n", splice(@data,0,3);
}

There's about n + 1 other ways of doing such a thing.

otheus · December 12, 2008, 1:19pm

If you really want to output this into another perl script, rather than just doing it all in perl, you can do this:

#!/usr/bin/perl -w
while (<>) { 
  print $1 
     if /^\s*Generation (\d+)/;
  print " ",$1, " ",$2,"\n"
     if /imported at \w+ (\w+) .* (\d{4})\s*$/;
}

Then you just pipe that script's output to the other perl script.

miskin · December 12, 2008, 1:46pm

Wow... fast respond, thanks.. hope i can be more clear of my request

Well.. i have a data file contain the data as per my first post, the file called data.txt.. the script will execute and ask an input to search through data.txt.. example input required by the script.

1) From Month :
2) From Year :
3) To Month :
4) To Year :

and let say the input will be like this

1) From Month : Jan
2) From Year : 2001
3) To Month : Dec
4) To Year : 2007

so from the input, the script will then search for the Generation ID# for the period Jan 2001 to Dec 2007. Once get the Generation ID# and dump the Generation ID# to another plain text file (genid.txt), the script then run the remaining command to complete the process.

Firstly, i must say that i'm still a newbie in this area .. and so sorry about that and so sorry if i trouble you guys Hope you guys can help me out.

Thanks in advance.

otheus · December 12, 2008, 3:39pm

Sounds pretty simple. But I'm not sure if there should be MULTIPLE generation IDs or not. Let's assume so. So let's say you pass the variables on the command line to your new script:

$ your-script.pl $from_month $from_year $to_month $to_year

So now you want a script to read these four arguments, run through the named file, and save it to another file, right? So:

$ your-script.pl $from_month $from_year $to_month $to_year <data.txt >genid.txt

Now comes the script:

#!/usr/bin/perl -w

# Create lookup table for Month specification so we can 
# determine which month-name is before/after another month name
%Month = ( Jan => 1, Feb => 2, Mar => 3, Apr => 4, May => 5, Jun => 6, Jul => 7, 
 Aug => 8, Sep => 9, Oct => 10, Nov => 11, Dec => 12 );

# Get arguments from command line
($from_month, $from_year, $to_month, $to_year) = @ARGV;

# loop through each input line
while (<>) { 
  # "remember" gen_id for next line of input. 
  $gen_id = $1
     if /^\s*Generation (\d+)/;

  # if the shoe fits...
  if (/imported at \w+ (\w+) .* (\d{4})\s*$/ 
    && $2 >= $from_year && $2 <= to_year
    && $Month{$1} >= $Month{$from_month}
    && $Month{$1} <= $Month{$to_month}
  ) { 
     print $gen_id,"\n";
  }
}

summer_cherry · December 15, 2008, 2:05am

#! /usr/bin/perl 
print "From year\n";
$f_year=<>;
print "From month\n";
$f_mon=<>;
print "To year\n";
$t_year=<>;
print "To month\n";
$t_mon=<>;
$f=sprintf("%s %s",$f_year,$f_mon);
$t=sprintf("%s %s",$t_year,$t_mon);
%hash=('Jan',1,'Feb',2,'Mar',3,'Apr',4,'May',5,'Jun',6,'Jul',7,'Aug',8,'Sep',9,'Oct',10,'Nov',11,'Dec',12);
$/="kb.\n";
open FH,"<a.txt";
$com="2001 Feb";
while(<FH>){
		my @tmp=split("\n",$_);
		my @arr1=split(" ",$tmp[0]);
		$id=$arr1[1];
		my @arr2=split(" ",$tmp[1]);
		$day=sprintf("%s %s",$arr2[$#arr2],$arr2[$#arr2-3]);
		print $id,"\n" if (_compare($day,$f)==1 && _compare($day,$t)==-1);
}
sub _compare{
	my($a,$b)=(@_);
	my @arr1=split(" ",$a);
	my @arr2=split(" ",$b);
	if ($arr1[0]>$arr2[0]){return 1;}
	if ($arr1[0]<$arr2[0]){return -1;}
	if($arr1[0]==$arr2[0]){
		if ($hash{$arr1[1]}>$hash{$arr2[1]}){return 1;}
		if ($hash{$arr1[1]}<$hash{$arr2[1]}) {return -1;}
		if ($hash{$arr1[1]}==$hash{$arr2[1]}) {return 0;}
	}
}

otheus · December 15, 2008, 6:16am

SummerCherry,

I haven't tried your script, but I imagine that the line that loads the id ($id=$arr[1] will load on every line, when in fact, it should only load on lines marked "Generation".