Problem saving return value of subroutine in perl

Hi all, I have this code

#This program read the triplets from file named "data" into
#an array of array.
use strict;
use warnings;
use Data::Dumper;
use Graph; 
use Graph::Subgraph;

my @S;
 while (<>) {
        push @S, [ split ];
    }
print "-----TRIPLETS-------\n";
print Dumper \@S;
#Make a copy of @S 
my @trip = map { [@$_] } @S;

# Find the number of vertices
my @L;
for my $i ( 0 .. $#S ) {
for my $j ( 0 .. $#{ $S[$i] } ) {
push (@L,$S[$i][$j]);
     }
 }
my %seen;
@L = grep { ! $seen{ $_ }++ } @L;
print " ----VERTICES------\n";
print Dumper \@L;


# Now lets generate the G(L)
# In order to generate the G(L) we'll extract first two columns of S into another matrix
my @GL=@S;
splice(@$_, 2, 1)
   foreach @GL;
print "----EDGE LIST TO BUILD G(L)-----\n";
print Dumper \@GL;

#my %h = map { $_->[0] => $_->[1] } @S;
#print Dumper(\%h);


##### CONNECTED COMPONENTS ##########
my $g = Graph->new( undirected => 1 );

my @a;
my @b;
for (my $p = 0; $p <= 2; $p++) {
$a[$p]=$S[$p][0];
  }

for (my $q = 0; $q <= 2; $q++) {
$b[$q]=$S[$q][1];
  }

for (my $r = 0; $r <= 2; $r++) {
     $g->add_edge($a[$r], $b[$r]);
 }

my @subgraphs = $g->connected_components;
my @allgraphs;
my $V = $g->vertices;
print "Number of taxa=$V\n";


my $q=scalar @subgraphs;
print "Number of connected components ", $q , "\n";
print "First connected component: ", @{ $subgraphs[0] }, "\n";
print "First connected component element: ", $subgraphs[0][1], "\n\n";


sub induced {
my (@z)=@_;
for my $QT (\@z ){
    #print Dumper $QT;
    for my $triplet ( @trip ){
        my %Pie;
        undef @Pie{@$QT};
        delete @Pie{ @$triplet };
        print "@$triplet\n" if keys(%Pie) <= ( @$QT - @$triplet ) ;
        return (@$triplet);
    }
}}
my @C;
my $d;
my $p=$#subgraphs+1;
for ($d=$p; $d >=1; $d--)
  { 
    print "component $d = @{ $subgraphs[$d-1] }\n";
    my ($qw)=join(induced(@{ $subgraphs[$d-1] }));
    print "induced=$qw";

}

It takes in the data from data file ,the content of which is

----------DATA-----------
b c a
a c d
d e b

---OUTPUT----

-----TRIPLETS-------
$VAR1 = [
          [
            'b',
            'c',
            'a'
          ],
          [
            'a',
            'c',
            'd'
          ],
          [
            'd',
            'e',
            'b'
          ]
        ];
 ----VERTICES------
$VAR1 = [
          'b',
          'c',
          'a',
          'd',
          'e'
        ];
----EDGE LIST TO BUILD G(L)-----
$VAR1 = [
          [
            'b',
            'c'
          ],
          [
            'a',
            'c'
          ],
          [
            'd',
            'e'
          ]
        ];
Number of taxa=5
Number of connected components 2
First connected component: cab
First connected component element: a



component 2 = e d
induced=component 1 = c a b
b c a
induced=

Problem is with the last few lines of the output ,it should have been this

component 2 = e d 
component 1 = c b a
 induced=b c a

The problem is there in the way the subroutine return value is saved. Please suggest me why is this happening and how to fix it.

move line:

print "induced=$qw";

to last line in script;

Im sorry but that hardly serves the purpose as I have to check for each of the component,so will have to call the subroutine inside the loop.

---------- Post updated at 10:48 AM ---------- Previous update was at 07:14 AM ----------

Let me explain you what is wanted.Consider for example component 2. It has two vertices e,d. Now I want to see if any of the rows from "DATA" is a subset of these points.Now since each of the row of "DATA" has 3 vertices/points,therefore clearly for component two there isnt any induced line from DATA.Therefore there should be anything to print for the subroutine "induced" for this case.
But for component 1,which has vertices as = c,a,b; the first line of "DATA" which is "b c a" gets induced.
Similarly if component=a,b,c,d ; then first two rows of DATA get induced ,as they are both the subset of the component.

I don't quite understand the last paragraph of your 2nd post, but I'll explain the working of the final loop of your code.

...
sub induced {
my (@z)=@_;
for my $QT (\@z ){
    #print Dumper $QT;
    for my $triplet ( @trip ){
        my %Pie;
        undef @Pie{@$QT};
        delete @Pie{ @$triplet };
        print "@$triplet\n" if keys(%Pie) <= ( @$QT - @$triplet ) ;
        return (@$triplet);
    }
}}
my @C;
my $d;
my $p=$#subgraphs+1;
for ($d=$p; $d >=1; $d--)
  { 
    print "component $d = @{ $subgraphs[$d-1] }\n";
    my ($qw)=join(induced(@{ $subgraphs[$d-1] }));
    print "induced=$qw";

}

Notice the following about the subroutine "induced":
(1) it has a "print" statement in it based on an "if" condition being true.
(2) it returns an array. $triplet is an array ref, and @$triplet is the array it references.

Also notice the following about the final "for" loop:

(3) The "join" function has not been provided a "join-string" i.e. the first argument. So what happens if you use join without the join-string?

$
$ perl -le '@x = qw(a b c); $y = join(@x); print $y'

$

It doesn't print the array elements. Once you do supply the join-string, then:

$
$ perl -le '@x = qw(a b c); $y = join("~", @x); print $y'
a~b~c
$
$

it works fine.

(4) the final print function that prints the value of $qw does not have a newline.

Now, before the "for" loop, the value of $p is 2. In the first iteration:

(a) $d is 2 and the first print function prints the expected values.
(b) "join" returns nothing and hence $qw is a zero-length string.
(c) the second print function prints "induced=" and does **NOT** go to the next line due to the absence of a newline.

So after iteration 1, you have this printed:

component 2 = e d
induced=

In iteration # 2, the following happens:

(a) $d is 1 and the first print function prints the expected values (c, a, b)
(b) the "induced" subroutine, when invoked, prints the value "@$triplet\n" because apparently the "if" condition evaluates to true. Hence you see "b c a" printed. The "join" function, as usual, returns nothing and hence $qw is a zero-length string.
(c) the final print statement prints the constant string "induced=" followed by nothing.

So the output of **only** iteration 2 is as follows:

component 1 = c a b
b c a
induced=

And the overall output of both iterations is as follows:

component 2 = e d
induced=component 1 = c a b
b c a
induced=

Based on this understanding of the code, you may decide what to do next. You probably want to -
(a) fix the "join" function
(b) ensure that the final print function has a newline, and maybe
(c) suppress the "print" function inside the subroutine "induced"

Thus, if the final portion of your program looks like this:

...sub induced {
my (@z)=@_;
for my $QT (\@z ){
    #print Dumper $QT;
    for my $triplet ( @trip ){
        my %Pie;
        undef @Pie{@$QT};
        delete @Pie{ @$triplet };
        #print "@$triplet\n" if keys(%Pie) <= ( @$QT - @$triplet ) ;
        return (@$triplet);
    }
}}
my @C;
my $d;
my $p=$#subgraphs+1;
for ($d=$p; $d >=1; $d--)
{
  print "component $d = @{ $subgraphs[$d-1] }\n";
  my $qw = join " ", induced(@{ $subgraphs[$d-1] });
  print "induced = $qw\n";
}

then the final part of the output would be like this -

...
component 2 = e d
induced = b c a
component 1 = c b a
induced = b c a

I am not sure if you are expecting the return value of "induced" subroutine to be the same each time, or if you simply want to display the distinct return values. If that's the case, then set the variable $qw as a hash key, and print all keys of the hash after the final loop.

tyler_durden

Tyler,
After the modifications suggested by you the output is

component 2 = e d
induced= bca
component 1 = c b a
induced= bca

But the problem is that for component-2 there shouldnt be anything induced(because "e d" is not a subset of DATA).

The program needs finds all the rows of DATA which are a subset of the points in any given component.

----------DATA----------- 
b c a
a c d 
d e b

So suppose there is a component 3,which has these points= "a c d e". Then row 2nd and 3rd are a subset of these points.So for component3 I'm expecting this output
component 3 = a c d e
induced= a c d
d e b
I want to save these induced values in a suitable data structure as I need to further process them.

---------- Post updated at 07:32 AM ---------- Previous update was at 01:12 AM ----------

The code fails on this data

---------DATA-------------- 
b c a 
a c d 
d e b 
e f g 
g d f 
h i g

And the output is

component 2 = e d g f 
component 1 = c a b 
b c a

Which is wrong, because it should have been this

component 2 = e d g f 
e f g 
g d f 
component 1 = c a b 
b c a

Because with the vertices in component 2, we can have 4th & 5th row of DATA.
Please help on this

---------- Post updated at 03:30 PM ---------- Previous update was at 07:32 AM ----------

Tyler please help me fix my problem. Im still havent been able to solve it.

<rant>
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in Usenet and e-mail?
</rant>

Now that it's out of the way, back to your problem.

It appears that you want to check if an array is contained inside another array. You could use "grep" for that. Here's some sample code:

$
$ perl -le '@x = qw(a b c d e f);
            @y = qw(a d f);
            $n = grep { $e = $_; not grep { $e =~ /\Q$_/i } @x } @y;
            print "Count of elements in (@y) that are NOT present in (@x) = $n"
           '
Count of elements in (a d f) that are NOT present in (a b c d e f) = 0
$
$

So if the count is 0, you know that @y is a subset of @x and hence you want to return it from the "induced" subroutine.

tyler_durden

1 Like

Hi all ,
I had this problem of creating a supertree by recursive Alfred Aho's algorithm.So I divided the problem into its the key concepts and dealt
with them one by one. I'll first put up my code I built with help of people on this forum and then later on I'll explain my problem.So here's
what I've build till now

#This program read the triplets from file named "data" and returns the
#supertree. 
# ___DATA(triplets)____
#b c a
#a c d
#d e b
#### NOTE ::: SuperTree part hasnt been incorporated yet.
use strict;
use warnings;
use Data::Dumper;
use Graph; 
use Data::Dump qw/ pp /;
####READ IN THE INPUT DATA ########
my @triplets; # Get all the triplets
 while (<>) {
        push @triplets, [ split ];
    }

#Make a deep copy of @triplets 
my @triplet_deep_copy = map { [@$_] } @triplets;


#####AUXILIARY GRAPH   G(L) #######
# In order to generate the G(L)  first of all extract first two columns of  @triplets #into another matrix
my @auxiliary_edges=@triplets;
splice(@$_, 2, 1)
foreach @auxiliary_edges;
print "----EDGE LIST TO BUILD AUXILIARY GRAPH-----\n";
print Dumper \@auxiliary_edges;


##### CONNECTED COMPONENTS ##########
my $auxiliary_graph = Graph->new( undirected => 1 );

my @from;
my @to;
for (my $p = 0; $p <= 2; $p++) {
        $from[$p]=$triplets[$p][0];
  }

for (my $q = 0; $q <= 2; $q++) {
        $to[$q]=$triplets[$q][1];
  }

for (my $r = 0; $r <= 2; $r++) {
     $auxiliary_graph->add_edge($from[$r], $to[$r]);
 }

my @subgraphs = $auxiliary_graph->connected_components; # Subgraphs
my $V = $auxiliary_graph->vertices; # Number of taxa
my $connected_components=scalar @subgraphs; #Get the number of #connected components

###### FINDING THE TRIPLETS WHICH ARE SUBSET(OR INDUCED BY) OF #EACH OF THE CONNECTED COMPONENTS######
Main(@auxiliary_edges);
exit(0);
sub induced {
  my $trip = shift;
  my @matches;
  for my $QT ( @_ ) {
         for my $triplet ( @$trip ) {
                my %seen;        # my %Pie;
                undef @seen{@$QT};
                delete @seen{@$triplet};
                if ( keys( %seen ) <= ( @$QT - @$triplet ) ) {
                     push @matches, $triplet;
      }
    } ## end for my $triplet ( @$trip )
  } ## end for my $QT ( @_ )
  return @matches;
}## end sub induced


sub Main {
  my $tree = Graph->new( undirected => 1 );
  my $dad='u';
  $tree->add_vertex($dad);
  for my $one (@subgraphs) {
      my @matches = induced( \@triplet_deep_copy, $one );
      print "\nTriplet induced by subgraph ", pp( $one => { MATCHES =>\@matches } ), "\n\n";
  }
} 

So this is what I have written till now. Now let me explain my problem.

___INPUT(set of triplets)____
b c a
a c d
d e b

[p]Set of species/vertices=a,b,c,d,e[/p]
Now build the auxiliary graph by selecting first two vertices of each of the triplets,i.e.

b c
a c
d e 

The auxiliary graph thus generated will be

 a-c-b  d-e

The number of connected components in this auxiliary graph (q)=2 (viz. a-c-b and d-e)
The algorithm I need to implement is this:-

TreeConstruct(S)
1. Let L be the set of species appear in S. Build G(L)
2. Let C1 , C2 , . . . , Cq be the set of connected components in G(L)
3. If q >1, then
� For i = 1, 2, . . . , q, let Si be the set of triplets in S labeled by the set
of leaves in Ci .
� Let Ti = TreeConstruct(Si )
� Let T be a tree formed by connecting all Ti with the same parent node.
Return T.
4. If q=1 and C1 contains exactly one leaf, return the leaf; else return fail.

The progression will be like this:-

1. Initially we have q=2 (a-c-b & d-e). So introduce an internal vertex (u) and make these connected components child of u. 
u=> a-c-b;
    d-e; 
2. Select component 1 = a-c-b. Check all lines from INPUT which are a subset of this component1.First line of INPUT i.e. "b c a" is a subset of component1.
3."b c a" now becomes the INPUT for the program and it is recursed again with this INPUT(Now for input "b c a" the auxiliary graph will be "b-c" & "a",i.e. 
two connected components,thus q=2 ...)

Final output (SUPRTREE)for the given input should be like this

u  => u => d
        => e
   => u => a
        => u => b
             => c

TRIPLETS(input) and SUPERTREE(output) look like these
http://ars.sciencedirect.com/content/image/1-s2.0-S0166218X10000983-gr1.jpg The picture link above has the exact triplets for my problem and the exact supertree(output) expected.
The following link is a small chapter on the problem im dealing with.
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=0772DCA9649E3596FE5319A41B0F3193?doi=10.1.1.135.7740&rep=rep1&type=pdf
You just need to read the first 4 pages.Its a very quick read (not a lengthy research paper) and the most relevant and elaborate explanation on the algorithm and the terms. Hope it helps