How to match all array contents and display all highest matched sentences in perl?

Hi,

I have an array with 3 words in it and i have to match all the array contents and display the exact matched sentence i.e all 3 words should match with the sentence.

Here are sentences.


$arr1="Our data suggests that epithelial shape and growth control are unequally affected depending on how wt p53 function is impaired and whether partial or full tumor suppressor activity is lost";

$arr2="The growth of epithelial tissue is downregulation";

I want all there 3 words to be matched and that sentence has to printed first.

Here are the words. I tried like this:


$array="epithelial,growth,downregulation";
@split=split(",",$array);
foreach $word(@split)
{
    if($arr1=~/\b$word\b/i || $arr2=~/\b$word\b/i)
    {
         print "<br> Matched <br>";
    }

}

Just stuck up here!!

The output should be like this:

The growth of epithelial tissue is downregulation 

Our data suggests that epithelial shape and growth control are unequally affected depending on how wt p53 function is impaired and whether partial or full tumor suppressor activity is lost

How can i match all these words and print the sentence with highest match (all the words matching the sentence) in perl??

Any idea???

Regards
Vanitha

Maybe something like this:

$
$ cat -n data.txt
     1  Our data suggests that epithelial shape and growth control are unequally affected depending on how wt p53 function is impaired and whether partial or full tumor suppressor activity is lost
     2  The growth of epithelial tissue is downregulation
     3  The quick brown fox jumps over the lazy dog.
$
$ perl -ne 'BEGIN{@x=split /,/, "epithelial,growth,downregulation"} { chomp;
>   print $_,"\n" if $_ =~ /\b$x[0]\b/ && $_ =~ /\b$x[1]\b/ && $_ =~ /\b$x[2]\b/
> }' data.txt
The growth of epithelial tissue is downregulation
$
$

I don't know why you want the line "Our data suggests..." to be displayed as well. It does not have the word "downregulation" in it and as per your search criteria, it should not be displayed.

tyler_durden

Hi,

Thanks for the reply.

I have words and sentences in an array i.e

@arr1=("epithelial","downregulation","growth");

@arr2=("Our data suggests that epithelial shape and growth control are unequally affected depending on how wt p53 function is impaired and whether partial or full tumor suppressor activity is lost","The growth of epithelial tissue is downregulation");

foreach $word(@arr1)
{
     foreach $arr(@arr2)
    {
       if($arr=~/\b$word\b/i)
       {
           print "<br>matched<br>";
       }
   }
}

In such case how can i get the maximum matched sentence???

The first sentence in the output matches all the words so that is the highest priority and it is printed first.

Next sentence matches with only 2 words so the next priority.

Output:


The growth of epithelial tissue is downregulation

Our data suggests that epithelial shape and growth control are unequally affected depending on how wt p53 function is impaired and whether partial or full tumor suppressor activity is lost

How to get the desired highest maximum sentence ??

Any solutions ???

Regards
Vanitha

"How to get the desired highest maximum sentence ??" - funny. Not sure if "maximum sentence" is desirable. :wink:

You may want to do something like this:

$
$ cat matches.pl
#!perl -w
@arr1=("epithelial","downregulation","growth");
@arr2=("Our data suggests that epithelial shape and growth control are unequally affected",
       "The growth of epithelial tissue is downregulation",
       "The quick brown fox jumps over the lazy dog",
       "The observed downregulation is due to elevated levels of insulin in the blood");
$x = '\\b'.join('\\b|\\b',@arr1).'\\b';
foreach (@arr2) {
  $matches{$_} = () = /$x/g;
}
foreach $i (reverse sort {$matches{$a} <=> $matches{$b}} @arr2) {
  print $i,"\n";
}
$
$ perl matches.pl
The growth of epithelial tissue is downregulation
Our data suggests that epithelial shape and growth control are unequally affected
The observed downregulation is due to elevated levels of insulin in the blood
The quick brown fox jumps over the lazy dog
$
$

HTH,
tyler_durden

you may try below perl code

my $array="epithelial,growth,downregulation";
my @split=split(",",$array);
my ($reg,$str);
map {$str=sprintf("%s(?=.*%s)",$str,$_)} @split;
$reg=qr/^$str.*$/;
while(<DATA>){
	print if /$reg/;
}
__DATA__
The growth of epithelial tissue is downregulation
The downregulation growth of
The of epithelial tissue growth is downregulation
The downregulation growth of epithelial tissue is 

Hi,

Thanks a lot!!

Regards
Vanitha