loop through two files based on a variable

nogu0001 · February 23, 2009, 3:30am

Hi guyz. i have two files. based on keys(chr1, chr2..) it has to loop through the second file of the same keys and has to take the minimum number after substraction. Sorry if I made my question complicated.

file1
chr2 1989
chr2 2500
chr1 1500

file 2
chr1 1339
chr2 2000
chr2 3000
chr2 1200
chr1 1600
chr2 4000

output
chr2...1989.....2000
chr2...2500.....3000
chr1...1500.....1600

nogu0001 · February 23, 2009, 3:00pm

Hi,

do someone know, How we can find the smallest distance between values of different columns?
Like I have column1 & column2.
-73.924598,40.879010
-73.924506,40.878978
-73.924506,40.878978
-73.921406,40.878178
-73.921406,40.878178
-73.920806,40.878578
-73.920206,40.878978
-73.920206,40.878978
-73.918706,40.876578
-73.918706,40.876578

If I want to see, which one is closer to the first point in 1st column among all the points in second column and so on??
How should I do?

nogu0001 · February 23, 2009, 7:50pm

hi I have bee trying to print the closest numbers in 2 arrays @set and @vals but the script i'm using giving the output only for 1st no in 1st array i.e 15 with all values in array2. I'm getting output 56 only. I need the values close to 150,200 and 250. Is there any thing wrong with the script.
Sorry If my question is perplexing

the script I'm using

#!usr/bin/perl
use strict;
use warnings;

my @set = (15, 150, 200, 250);
my @vals = (208, 258, 56, 123);

print closest(@set, @vals), "\n";

sub closest {
my $val = shift;
my @list = sort { abs($a - $val) <=> abs($b - $val) } @_;
$list[0];
}

KevinADC · February 23, 2009, 10:22pm

All your code is doing is sorting a flattened list. Since 56 is the lowest number of the sorted list thats what gets returned. But after reading your post I can't figure out what it is you are actually wanting to do. Can you clarify?

KevinADC · February 23, 2009, 10:24pm

sorting the columns should be the way to go.

nogu0001 · February 23, 2009, 10:47pm

ya u r right but i need the way to claculate it

nogu0001 · February 23, 2009, 10:56pm

NO
but i do agree my question needs more explanation
If i change the values in @set to only one number like 200 it gives the output as 208,closest number of 200
when I tried the script with only one value it is working properly.
I want to extract the closest numbers from set of values in column1 with column2 . the value 3 in column1 has to sort all the values in column2 (4,6,8,1) and gives the closest number i.e,4 and vice versa.
the crux is each value of column1 hat to sort all the values in column2
I wasted 2 days for this it would be really grateful if u answer this

3....4
4....6
......8
......1

output
3.....4
4.....6

KevinADC · February 23, 2009, 11:19pm

No to what?

I am still pretty confused, but is this what you are trying to do?

#!usr/bin/perl
use strict;
use warnings;

my @set = (15, 150, 200, 250);
my @vals = (208, 258, 56, 123);

my @vals_sorted = sort {$a <=> $b} @vals;
foreach my $n (@set) {
   print "Closest to $n = ", shift @vals_sorted, "\n";
}

KevinADC · February 23, 2009, 11:23pm

I don't understand what you mean. Are you prohibited from using sort?

KevinADC · February 23, 2009, 11:27pm

Also the columns already appear to be sorted so the next line is the closest one, no?

nogu0001 · February 24, 2009, 12:17am

hi kevin
the logic is perfect
but instead of writing the values in arrays i would like to assign the no.of values from column1 and column2 i.e defing @set as values from column 1 and @val as column 2.
input file;;
chr1 100 112
chr1 150 300
chr1 80 400
.............286
.............100

script needs correction

#!usr/bin/perl -w
use strict;
use warnings;
my $infile1 = 're5.txt';
open IN10, "< $infile1" or die "Can't open $infile1 : $!";
my %values;
while (<F1>) {
chomp;
my ($chrom,$value1,$value2) = split /\t/;
my $rec = {value1 => $value1, value2 => $value2
};
push @{$vlaues{$chrom}}, $rec;
}
my @set = $value1;
my @vals = $value2;
my @vals_sorted = sort {$a <=> $b} @vals;
foreach my $n (@set) {
print "Closest to $n = ", shift @vals_sorted, "\n";
}

Thanx for the ideas
Funny laughs @other mails

nogu0001 · February 24, 2009, 12:32am

if u donmind i will post to u

nogu0001 · February 24, 2009, 12:41am

#!/usr/bin/perl
$infile1 = 'file.txt';
$infile2 = 'cpg2.txt';
$outfile7 = 'out10.txt';
open IN10, "< $infile1" or die "Can't open $infile1 : $!";
open IN11, "< $infile2" or die "Can't open $infile2 : $!";
open OUT7, "> $outfile7" or die "Can't open $outfile7 : $!";

my %chromes;
my %chromes1;
while (<IN10>) {
chomp;
my ($arrayid,$ncrnaid1,$ncrnaid2,$ncrnaid3,$ncrnaid4,$ncrnaid5,$ncrnaid6,$ncrnaid7,$ncrnaid8,$ncrnaid9,$ncrnaid10,$ncrnaid11,$ncrnaid12, $chrom,$start,$end,$cstrand,$en,$esi,$est) = split /\t/;
my $rec = {arrayid => $arrayid,
ncrnaid1 => $ncrnaid1,
ncrnaid2 => $ncrnaid2,
ncrnaid3 => $ncrnaid3,
ncrnaid4 => $ncrnaid4,
ncrnaid5 => $ncrnaid5,
ncrnaid6 => $ncrnaid6,
ncrnaid7 => $ncrnaid7,
ncrnaid8 => $ncrnaid8,
ncrnaid9 => $ncrnaid9,
ncrnaid10 => $ncrnaid10,
ncrnaid11 => $ncrnaid11,
ncrnaid12 => $ncrnaid12,
start => $start,
end => $end,
cstrand => $cstrand,
en => $en,
esi => $esi,
est => $est};
push @{$chromes{$chrom}}, $rec;
}
my @arrayids;
sub input {
my @attrs =qw(chrom start);
while (<IN10>) {
chomp;
my %rec;
@rec{@attrs} = split /\t/;
push @arrayids,\%rec;
}
}
foreach my $chrom (sort keys %chromes){
my $count = scalar @{$chromes{$chrom}};
print OUT7 "$chrom\t$count\t\n\n";
print OUT7 map {"\t\t$_->{start}\t\n"} @{$chromes{$chrom}};
}

#########################################

while (<IN11>) {
chomp;
my ($cchrom,$middle) = split /\t/;
my $cpg = {middle => $middle};
push @{$chromes1{$cchrom}}, $cpg;
}
my @cpgids;
sub input {
my @cpgs =qw(cchrom middle);
while (<IN11>) {
chomp;
my %cpg;
@cpg{@cpgs} = split /\t/;
push @cpgids,\%cpg;
}
}
foreach my $cchrom (sort keys %chromes1){
my $count = scalar @{$chromes1{$cchrom}};
print OUT7 "$cchrom\t$count\t\n\n";
print OUT7 map {"\t\t$_->{middle}\t\n"} @{$chromes1{$cchrom}};
}
#my @set = $start; #callin $start from $start (file1)
#my @vals = $middle; #calling $middle from $middle (file2)
#my @vals_sorted = sort {$a <=> $b} @vals;
#foreach my $n (@set) {
# print "Closest to $n = ", shift @vals_sorted, "\n";
}
close IN10;
close IN11;
close OUT7;

nogu0001 · February 24, 2009, 12:47am

I caught with some group meeting crap.
well here I'm trying to recall the column1 ($start) from file1, IN10 and column2, ($middle) from file2, IN11. I think I screwed up some where.

note :I'm giving sample outputs of script so far except calculation of the closest point

nogu0001 · February 24, 2009, 1:12am

ouput of file1, IN10-----chrom--count---start

chr1 10

chr10 7

output file 2,IN11----------chrom...count...middle

chr1 2463

nogu0001 · February 24, 2009, 2:08am

output must have some algorithm like this but it has a bug, the following algorithm has file1 only. I dont know how to create foreach loop for both file1 and file2

foreach my $chrom (sort keys %chromes){

if ( $chrom =~ /^chr1/){

print closest of start and middle of chr1 in both files
This is the point where I have to insert the script you replied before

}
elsif { $chrome =~ (^/chr2/)
print ........................................chr2..... and so on
}
else(not recognized)
}

summer_cherry · February 24, 2009, 4:37am

#!/usr/bin/perl
use strict;
open FH,"<a.txt" or die "Can not open file";
my (@a1,@a2,$gap,$key);
while(<FH>){
	chomp;
	my @tmp=split(",",$_);
	push @a1,$tmp[0];
	push @a2,$tmp[1];
}
close FH;
for(my $i=0;$i<=$#a1;$i++){
	$gap=0;
	$key=0;
	for(my $j=0;$j<=$#a1;$j++){
		my $tmp=abs($a2[$j]-$a1[$i]);
		if ($gap==0||$tmp<=$gap){
			$gap=$tmp;
			$key=$a2[$j];
		}
	}
	print $a1[$i],",",$key,"\n";
}

nogu0001 · February 24, 2009, 5:51am

Awessssssssssssome Dude
I have no words for it u just used the logic I'm just thinking of

what if column1 values are few and column2 values are more and still has comma with it
73.924598,40.879010
73.924506,40.878978
73.924506,40.878978
................,40.878178
................,50.878178
................,60.878578

otheus · February 24, 2009, 5:53am

Here is my answer for you, but as you subverted your Read Only status, which was a result of your persistently breaking the forum rules, you are banned.

# For each key in %file1,
#   1. split the key into name/start parts
#   2. search for the record in file2 that BOTH:
#     (a) the corresponding records have the same "name" field
#     (b) has the smallest difference between $start and $middleno
#         of any of the records
#   3. Print out both records in one line
#   4. Delete these record from the structures (so they cannot be matched again)
#
foreach $key (sort {$a <=> $b} keys %file1) {
  my ($name, $start) = split(":",$key);
  my $min = 2 ** 31;  # start as maximum integer
  my $smallest_key = undef;
  for (my $i=0; $i <= $#file2_middle_keys; ++$i) {
      my $current_key = $name .":". $file2_middle_keys[$i];

      # skip entries that do not match the $name in file2 (see (1a), above)
      next unless (exists $file2{ $current_key });

      # calculate difference and see  (see (1b), above)
      my $diff = abs($file2_middle_keys[$i] - $start);
      if ($i > 0 && $diff > $min) {
        # stop -- difference is getting bigger. No need to proceed.
        last;
      }
      if ($min > $diff) {
        $min = abs($file2_middle_keys[$i] - $key);
        $smallest_key = $current_key;
      }
  }
  # answer of (2b) is in $smallest_key
  if (defined $smallest_key) {
    # (3)
    print join(" ",
      @{ $file1{$key} }[1,2,3],
      @{ $file2{$smallest_key } }[3,1,2],
      @{ $file1{$key} }[0],
    )."\n";

    # (4)
    delete $file1{$key};
    delete $file2{$smallest_key };
  }
}
# - Licenced under AGPL (http://www.gnu.org/licenses/agpl.txt)
# - Author: Otheus [http://www.unix.com/members/302022384.html]

KevinADC · February 24, 2009, 1:10pm

oops..... so long nogu0001