OK, now I can do that. I understand now and will give you an example in a minute...
---------- Post updated at 01:05 PM ---------- Previous update was at 11:34 AM ----------
OK given the following files:
FILE1
Column 1 .... Column 6 ..... Column 88
(Index) (mutation) (Genotype)
1 Intronic TT
2 Frameshift GT
3 Exonic AT
4 Exonic AA
5 Intronic GC
FILE2
Column 1 Column 2
(index) (reference letter)
1 A
2 C
3 C
4 A
5 G
I have written this:
#!/usr/bin/perl
use strict;
use warnings;
my ($idx, $tmp, $geno, @ref, @data);
# file1.text is the actual data file
# We want to match data lines to reference lines
# So first we place the data lines (not columns) into an array.
open(FILE1,"<","file1.txt") or die $!;
while (<FILE1>) {
chomp;
# If the line does not start with a number
# we skip this line
next unless $_ =~ /^\d/;
# split the line into index (idx), and all else is placed in tmp
($idx, $tmp) = split(/\s+/,$_,2);
# Populate the data array with the line minus the index column
$data[$idx] = $tmp;
}
# file2.txt is the referece file with only 2 columns
# We parse the file and split it into index and value pairs.
# Then we can use the index to match the data index and
# once we have that data we can begin to break it down to
# it's column components and match as needed/
open(FILE2,"<","file2.txt") or die $!;
while (<FILE2>) {
chomp;
next unless $_ =~ /^\d/;
# Split the index and data
/^(\d*)\s*([a-z]*)/i;
# Split the data line columns by spaces and
# place these columns into a new temp array
my @tmparr = split(/\s+/,$data[$1]);
# Now we can look directly at 1 (or other) column for testing
# column 1 in my case but column 86 in yours
# Column 88 becomes column 86 because we -1 for removed index in first loop above and we -1 for 0 based array
# My file1.txt had 3 colmns, removing the index leaves 2 columns, and a zero based array means we have column 0 and 1.
next unless $tmparr[1] !~ /$2/;
print $1 . "\t" . $data[$1] . "\n";
}
and the result is:
> ./test.pl
1 Intronic TT
2 Frameshift GT
3 Exonic AT
---------- Post updated at 01:16 PM ---------- Previous update was at 01:05 PM ----------
Just a quick note, without all my comments this is not a large script either:
#!/usr/bin/perl
use strict;
use warnings;
my ($idx, $tmp, $geno, @ref, @data);
open(FILE1,"<","file1.txt") or die $!;
while (<FILE1>) {
chomp;
next unless $_ =~ /^\d/;
($idx, $tmp) = split(/\s+/,$_,2);
$data[$idx] = $tmp;
}
open(FILE2,"<","file2.txt") or die $!;
while (<FILE2>) {
chomp;
next unless $_ =~ /^\d/;
/^(\d*)\s*([a-z]*)/i;
my @tmparr = split(/\s+/,$data[$1]);
next unless $tmparr[1] !~ /$2/;
print $1 . "\t" . $data[$1] . "\n";
}