Perl - grep issue in filenames with wildcards

Hi
I have 2 directories t1 and t2 with some files in it. I have to see whether the files present in t1 is also there in t2 or not. Currently, both the directories contain the same files as shown below:

$ABC.TXT
def.txt

Now, when I run the below script, it tells def.txt is found, $ABC.TXT not found. Since the filename itself contains wildcard, grep seems to be having some issue:

#!/usr/bin/perl
 
 
my @src=`ls /home/t1/`;
my @dest=`ls /home/t2/`;
 
foreach my $file (@src){
   @match=grep(/$file/, @dest);
   if (@match==0){
      print "Found $file";
   }else{
      print "Not found $file";
   }
}

Output:

Found def.txt
Not found $ABC.TXT

Please advice. I have put 2 files for samples. I also have some files like $$ABC.TXT, $$$ABC.TXT.

Guru

How about...

#!/usr/bin/perl
 
 
my @src=`ls /home/t1/`;
my @dest=`ls /home/t2/`;
 
SRC: foreach my $src (@src){
   DEST: foreach my $dest (@dest) {
        if ( $src eq $dest ){
          print "Found $file";
          next SRC;
       }
   }
   print "Not found $src";
}

Hi Jerry
Thanks for your reply. The number of files I have in the source and destination directories are around 50k, and hence using a nested for will cause some performance issue.

Can this be achieved using grep itself?

Guru

Hi,

Other solution using 'perl' too. I hope it can be useful for you:

$ cat script.pl
use strict;
use warnings;

my %src = ();
my %dest = ();
map { $src{$_} = 1 } qx{ ls -1 /home/t1 };
map { $dest{$_} = 1 } qx{ ls -1 /home/t2 };

map { print "Found $_"; delete $src{$_} } grep { $dest{$_} } keys %src;
for (keys %src) {
        print "Not found $_";
}
$ perl script.pl
(...output suppressed...)

Regards,
Birei

Have you tested or benchmarked this claim?
The "grep" method essentially does a nested loop as well. It's just more concise and a more "Perlish" way of doing things.

With a setup consisting of two directories "t1" and "t2" having 10,005 identical files, my benchmark shows this -

$
$
$ cat -n cmpdir.pl
  1  #!perl
  2  use Benchmark qw (cmpthese);
  3
  4  my @src  = `ls -1 ./t1/`;
  5  my @dest = `ls -1 ./t2/`;
  6
  7  sub using_grep {
  8    my $found = 0;
  9    my $notfound = 0;
 10    foreach my $file (@src){
 11       @match = grep {/$file/} @dest;
 12       if ($#match == -1){
 13         $notfound++;
 14       } else {
 15         $found++;
 16       }
 17    }
 18    return "$found, $notfound";
 19  }
 20
 21  sub using_loop {
 22    my $found = 0;
 23    my $notfound = 0;
 24    SRC: foreach my $src (@src) {
 25      DEST: foreach my $dest (@dest) {
 26        if ( $src eq $dest ) {
 27          $found++;
 28          next SRC;
 29        }
 30      }
 31      $notfound++;
 32    }
 33    return "$found, $notfound";
 34  }
 35
 36  print "From using_grep, (found, notfound) = ", using_grep(), "\n";
 37  print "From using_loop, (found, notfound) = ", using_loop(), "\n\n";
 38
 39  cmpthese (2, {
 40      using_grep => sub {using_grep()},
 41      using_loop => sub {using_loop()},
 42    }
 43  );
 44
$
$
$ perl cmpdir.pl
From using_grep, (found, notfound) = 10005, 0
From using_loop, (found, notfound) = 10005, 0
 
        s/iter using_grep using_loop
using_grep    148         --       -95%
using_loop   7.27      1943%         --
$
$

This shows that -

(A) Perl's "grep" operator takes 148 seconds on an average to compare 10,005 files in directories "t1" and "t2".
(B) Perl's nested loop method takes 7.27 seconds on an average to compare 10,005 files in directories "t1" and "t2".

tyler_durden

---------- Post updated 02-10-11 at 01:44 PM ---------- Previous update was 02-09-11 at 02:15 PM ----------

For academic interest, if you were looking for a solution for the problem mentioned in your original post, you will have to quote the string you are searching for, so that it isn't interpolated -

$
$ # Elements $C, $$D, $$$E of array @x exist in array @y, but
$ # that is not reported because Perl tries to interpolate
$ # the "$" regex metacharacter.
$
$ perl -le '@x = qw(    B   $C   $$D  $$$E    F );
            @y = qw( A  B   $C   $$D  $$$E      );
            foreach $i (@x) {
              $exists = grep {/$i/} @y;
              printf("%-10s %-20s in \@y\n", $i, $exists==1 ? "exists" : "does not exist");
            }'
B          exists               in @y
$C         does not exist       in @y
$$D        does not exist       in @y
$$$E       does not exist       in @y
F          does not exist       in @y
$
$
$ # Works fine with quotemeta function
$
$ perl -le '@x = qw(    B   $C   $$D  $$$E    F );
            @y = qw( A  B   $C   $$D  $$$E      );
            foreach $i (@x) {
              $exists = grep {/\Q$i\E/} @y;
              printf("%-10s %-20s in \@y\n", $i, $exists==1 ? "exists" : "does not exist");
            }'
B          exists               in @y
$C         exists               in @y
$$D        exists               in @y
$$$E       exists               in @y
F          does not exist       in @y
$
$

Have a look at the quotemeta function -

[quotemeta - perldoc.perl.org]
(quotemeta - Perldoc Browser)

and also the gory details of parsing quoted constructs -

perlop - perldoc.perl.org

in the Perl documentation.

HTH,
tyler_durden

1 Like

Thanks a lot tyler_durden. This was what I was looking for. Learnt quite a few things in this post.

Guru.