Perl - Grep open file more then once.

OldGaf · July 16, 2011, 10:55pm

Hi,

I am using File::Find to go through a very large tree.
I am looking for all xml files and open only those that contain a tag <Updated>. I then want to capture the contents of two tags <Old> and <New>.

My problem is, after I open the file and do the first grep for <Updated> (which does work), I am unable to grep again unless I close the file and open it.

I did something like this:

find(\&check, $dir);

sub check {
          if ($_ =~ /.xml/){
          open(FILE,"$_");
                  if (grep{/Updated/} <FILE>){    # <-- works
                       my $old = grep /OLD/ <FILE>  #<-- does not - syntax may not be right, working from mem
                       my $new = grep /New/ <FILE>  #<--does not
                    } 
         }    
}

Print "File $_ has Old value of $old and new value of $new";

I gave up on using grep more then once and tried using while:

open(FILE,"$_");
if (grep{/Updated/} <FILE>){ 
    my $path = cwd;
    print "\n$path\n";
    print "$_ \n";
    while ($line = <FILE>) {
   if ($line =~ m/Old|New/) { print $line }
   } 
  } 
} 
close FILE;

This also did not work.
If I close the file and open it again before the while statement, it works.

So....
1) Is there a way to grep the open file more then once and create $vars?
2) If not, is it better (faster) to write the file to a @list and then grep that rather then keep opening and closing the file?

I would rather grep then use "while" as that causes other headaches with the rest of the script.

Most of the files are fairly small.... but there will be many thousand of them.

Any thoughts / slaps to the head?

-OG-

matrixmadhan · July 16, 2011, 11:08pm

No need to open a file for grepping, when grep it does that automatically for you

File::Grep - search.cpan.org

zedex · July 17, 2011, 11:57am

Try this, this should work. I am not sure about your goal here.

If it works try to find answers to why changing approach worked!

find(\&check, $dir);

sub check {
    if ($_ =~ /.xml/)
    { 
         open(FILE,"$_") or die "Failed to open $_ file , Error [$!]\n" ; 
             chomp ( my @Contents =  <FILE> ) ; 
         close(FILE) ; 

         my ($old,$new) ; 

         if ( grep(/Updated/, @Contents ) ) 
         {
              $old  = grep(/^OLD$/,@Contents ) ; # $old will be array 
              $new = grep(/^New$/,@Contents ) ; 
          }
      }
}

drl · July 18, 2011, 4:17pm

Hi.

Indeed File::Grep will open files for you. Here is a driver that lists data files, the perl code, and the results of counting lines in the files twice, then searching for a pattern in those same files:

#!/usr/bin/env bash

# @(#) s1	Demonstrate File::Grep, test driver for.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C perl

p=./p1

pl " Input data files data[12]:"
for file in data[12]
do
  pe
  pe " File $file:"
  cat $file
done

pl " perl script:"
cat $p

pl " Results:"
$p hi data*

exit 0

producing:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
GNU bash 3.2.39
perl 5.10.0

-----
 Input data files data[12]:

 File data1:
hi
hi 2

 File data2:
hi 3
lo
hi 4
by

-----
 perl script:
#!/usr/bin/env perl

# @(#) p1	Demonstrate File::Grep, perl grep on a file.

use File::Grep qw( fgrep fmap fdo );
use warnings;
use strict;

$File::Grep::SILENT = 0;
my ($keyword) = shift || die " Must have at least a keyword.\n";
print " keyword is :$keyword:\n";
my (@f) = @ARGV;
print " files are :@f:\n";

# fdo
print "\n";
my ($count) = 0;
fdo { $count++ } @f;
print "Pass 1 (no open), total lines in :@f: -- $count\n";

print "\n";
$count = 0;
fdo { $count++ } @f;
print "Pass 2 (no open), total lines in :@f: -- $count\n";

# fmap (fgrep is complex).
my (@matches);
print "\n";
print " Matches for :$keyword: in :@f: (no open)\n";
fmap { push( @matches, $_ ) if /$keyword/; } @f;
print @matches;

exit;

-----
 Results:
 keyword is :hi:
 files are :data1 data2:

Pass 1 (no open), total lines in :data1 data2: -- 6

Pass 2 (no open), total lines in :data1 data2: -- 6

 Matches for :hi: in :data1 data2: (no open)
hi
hi 2
hi 3
hi 4

Best wishes ... cheers, drl