PERL: Searching for a string in a text file problem

meevagh · April 28, 2008, 11:34am

Looking for a bit of help. I need to search for a string of words, but unfortunately these words are located on separate lines.

for example the text output is:

United
Chanmpions
Ronaldo
Liverpool
Losers
Torres

and my script code is

print("DEBUG - checking file message");
while (<FILE>){
$line = $_;

if\($line =~ /United/ \)\{
    print\("\\nAbout to send email\\n"\);
    sendEmail\($contacts,
     "",
     "Monitoring",
     "\\nPlease be aware that there is a problem.",
     "",
     ""\);
\}

the above script will send out an e-mail when it locates United, but I need to send out an e-mail when it gets United Champions Ronaldo.

I thought something like
if($line =~ /United/n Champions/n Ronaldo/)

But no luck.

Any suggestions as to how to go about this.

photon · April 28, 2008, 11:52am

$line =~ /(United|Chanmpions|Ronaldo|LiverpoolLosers|Torres)/i

meevagh · April 28, 2008, 12:13pm

Hi Photon, Thanks for the reply.
But not quite what I am trying to do, I prob could have explained it a bit better.

My problem is that I only want to send out an e-mail if the lines
United
Champions
Ronaldo
occur directly after each other as in.

sample text file

United
Chanmpions
Ronaldo
Liverpool
Losers
Torres

I dont want to send an e-mail just if the words are located in the file for example I dont want to send an e-mail if text file is

United
Champions
Torres
Liverpool
Losers
Ronaldo

as the lines im interested dont occur in the correct order.

photon · April 28, 2008, 12:24pm

$line =~ /[United]?\s?[Chanmpions]?\s?[Ronaldo]?\s?/

quine · April 28, 2008, 12:29pm

Two approaches....

Search for any of the words on each line and every time you find one, add it to a hash... e.g. $somehash{"Renaldo"} = 1;

When you've finished scanning the whole file, check the resulting hash for the existence of all the words....

if (exists $somehash{"Renaldo"} && exists $somehash{"loosers"} && exists ... ) { send email ... }

Something like that....

OR....

You could try a pattern like

$FILEBUFFER =~ /(A|B|C|D).+(A|B|C|D).+(A|B|C|D).... /is

You simply repeat the alternatives over and over again separated by one or more of any character, and that way you catch all of them if present no matter what the order.... You have to test the resulting capture to see if all words are present... Note the "is" at the end of the pattern... "i" causes case to be ignored, and "s" says to count a newline as one of the "any characters" which lets you match across lines... Note that in this case $FILEBUFFER contains the WHOLE file (see READ()), not a line....

drl · April 28, 2008, 12:32pm

Hi.

Here is a quickly-written possibility:

#!/usr/bin/perl

# @(#) p2       Demonstrate matching across line boundaries.

use warnings;
use strict;

my ($debug);
$debug = 0;
$debug = 1;

my $file;

for $file (@ARGV) {
  print "\n -----\n";
  my $lines = slurp($file);
  print " File contains:\n$lines";

  print "\n";
  if ( $lines =~ /United.*Champions.*Ronaldo/xms ) {
    print " Hit!\n";
  }
  else {
    print " Oh, a miss!\n";
  }
}

sub slurp {

  # Best practices, p213 for a file.
  my ($file) = shift;
  my ($f);
  open( $f, "<", $file ) || die " Can't open file $file, quitting.\n";
  my $scalar = do { local $/; <$f> };
  return $scalar;
}

exit(0);

Producing output for a bad dataset and a good dataset:

% ./p2 data1 data2

 -----
 File contains:
United
Champions
Liverpool
Losers
Torres

 Oh, a miss!

 -----
 File contains:
United
Champions
Ronaldo
Liverpool
Losers
Torres

 Hit!

See perl documentation for details ... cheers, drl

drl · April 28, 2008, 12:40pm

Hi.

Changing 4 characters in the regular expression allows:

% ./p3 data1 data2 data3

 -----
 File contains:
United
Champions
Liverpool
Losers
Torres

 Oh, a miss!

 -----
 File contains:
United
Champions
Ronaldo
Liverpool
Losers
Torres

 Hit!

 -----
 File contains:
United
Champions
Liverpool
Losers
Torres
Ronaldo

 Oh, a miss!

Best wishes ... cheers, drl

KevinADC · April 28, 2008, 1:30pm

The above code is wrong as it is an incorrect use of character classes: which match any of the characters inside the square brackets in any order. You can't use them to match whole words, at least not easily or effciently or in the way they are used above. Plus that would also check just one line of the file at a time.

photon · April 28, 2008, 2:50pm

I changed it, point was to use one liner regular expression instead of using hashes and loops and getting all complex. I wish I had the time to test all my code but I am on work time.

era · April 28, 2008, 3:02pm

perl -0000 -ne 'exit 0 if m/United\nChampions\nRonaldo\n/; exit 1;' file

This doesn't print anything, just sets its exit code to tell whether or not a match was found; suitable for inclusion in a shell script to decide whether or not to send mail.

Watch the direction of the backslashes in \n and the very significant -0000 option.

The Perl FAQ has a section devoted to roughly this topic; perhaps you want to read it.

I'm sure you noticed you misspelled "Champions" in your sample test file, so it doesn't actually work /-:

KevinADC · April 28, 2008, 3:38pm

Understandable. I am also using a computer right now with no way to test code. But I would suggest that if the point is to make a point about something you could mention that in your post. I did not get the impression your point was to suggest using a one-liner, but maybe other people reading the thread did.

era · April 28, 2008, 3:50pm

meevagh: I meant to mention, but forgot; your script examines only one line at a time. If you build up an array of the last n lines then you can compare those (joined into a string) against a pattern of n lines.

If you aren't examining a very large file then maybe just slurp it all in and compare against your pattern. If you take that route, take care to think about whether it matters whether United matches at beginning of line, and consider adding the /m option to the regex if you decide it does. Actually you might also want to allow Ronaldo at end of file without a trailing newline, too?

meevagh · April 29, 2008, 5:15am

I'm trying to work of DRL's code sample that he posted. His sloution looks as though it would be perfect for what I need, although when trying to get the script to run, i'm getting a

"Too many arguments for open at"

I think that it's occuring in theses lines

sub slurp {

# Best practices, p213 for a file.
my ($x25hostcheck) = shift;
my ($f);
open( $f, "<", $x25hostcheck ) || die " Can't open file $x25hostcheck, quitting.\n";
my $scalar = do { local $/; <$f> };
return $scalar;
}

I have tried to figure it out but have made no progress.
Any suggestions????

Oh...am i also meant to set a path to the my ($f); like

my $f = "/export/home/test.txt"

drl · April 29, 2008, 10:05am

Hi.

As discussed privately, the version of perl that meevagh is using is quite old. Here is a version of the script that tries to go back in time using the old-style of open (i.e. not the indirect filehandle), and other changes I could think of:

#!/usr/bin/perl -w

# @(#) p6       Demonstrate matching across line boundaries.

# Use old-style open.

use strict;

my ($debug);
$debug = 0;
$debug = 1;

my $file;

for $file (@ARGV) {
  print "\n -----\n";
  my $lines = slurp($file);
  print " File contains:\n$lines";

  print "\n";
  if ( $lines =~ /United\nChampions\nRonaldo/xms ) {
    print " Hit!\n";
  }
  else {
    print " Oh, a miss!\n";
  }
}

sub slurp {

  # Best practices, p213 for a file.
  my ($file) = shift;
  open( F, "<$file" ) || die " Can't open file $file, quitting.\n";
  my $scalar = do { local $/; <F> };
  close F;
  return $scalar;
}

exit(0);

Producing:

% ./p6 data1 data2 data3

 -----
 File contains:
United
Champions
Liverpool
Losers
Torres

 Oh, a miss!

 -----
 File contains:
United
Champions
Ronaldo
Liverpool
Losers
Torres

 Hit!

 -----
 File contains:
United
Champions
Liverpool
Losers
Torres
Ronaldo

 Oh, a miss!

cheers, drl

drl · April 29, 2008, 10:11am

Hi.

Here is era's perl one-liner in a shell script:

#!/bin/bash -

# @(#) user2    Demonstrate one-liner perl solution in shell script.

echo
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version =o $(_eat $0 $1) perl nl

for FILE in data[1-3]
do
echo
echo "-----"
nl $FILE
if perl -0000 -ne 'exit 0 if m/United\nChampions\nRonaldo\n/; exit 1;' $FILE
then
  echo " Hit!"
else
  echo " Oh, too bad, a miss :("
fi
done

exit 0

Producing:

% ./user2

(Versions displayed with local utility "version")
Linux 2.6.11-x1
GNU bash 2.05b.0
perl 5.8.4
nl (coreutils) 5.2.1

-----
     1  United
     2  Champions
     3  Liverpool
     4  Losers
     5  Torres
 Oh, too bad, a miss :(

-----
     1  United
     2  Champions
     3  Ronaldo
     4  Liverpool
     5  Losers
     6  Torres
 Hit!

-----
     1  United
     2  Champions
     3  Liverpool
     4  Losers
     5  Torres
     6  Ronaldo
 Oh, too bad, a miss :(

cheers, drl

ghostdog74 · May 1, 2008, 10:57am

open(F,"<file") or die "cannot open file:$!";
@data=<F>;
close(F);
for($i=0;$i< length(@data);$i++){  
  if ($data[$i] == "United" and $data[$i+1] == "Chanmpions" and $data[$i+2] =="Ronaldo" ) {
    #do email
    print "do...email ..";
    break
  }  
}