Hello,
This porblem bugged me for some time. It is to merge different files of hundred rows to have a union with the ID as key column (kind of similar to join!) and absence with 0.
ID File1
A 1
C 3
D 4
M 6
ID File2
A 5
B 10
C 15
Z 26
ID File3
A 2
B 6
O 20
X 9
I want the output as
ID File File2 File3
A 1 5 2
B 0 10 6
C 3 15 0
D 4 0 0
M 6 0 0
O 0 0 20
X 0 0 9
Z 0 26 0
I search the site that there some posts about merge two files, by a common column, but my case is different. I tried my code which is working but the output lost some of the information
#!/usr/bin/perl -w
use strict;
my $Fname1="./path/file1.txt"; #tab delimited format
my $Fname2="./path/file2.txt";
my $Fname3="./path/file3.txt";
my %combinedfile;
my key;
open(F1, "<$Fname1") or die "Cann't find the input file $Fname1 becuase of $!";
while (my line1 = <F1>) {
chomp ($line1);
my ($ID1, $count)=split("\t", $line1);
$key=$ID1;
$combinedfile{$key}=$count;
}
close (F1);
open(F2, "<$Fname2") or die "Cann't find the input file $Fname1 becuase of $!";
while (my line2 = <F2>) {
chomp ($line2);
my ($ID2, $count2)=split("\t", $line2);
$key=$ID2;
if (exists($combinedfile{$key}
{ $combinedfile{$key}.="\n$count2";}
else {
$combinedfile{$key}="0\n$count2";
}
close (F2);
open(F3, "<$Fname3") or die "Cann't find the input file $Fname1 becuase of $!";
while (my line3 = <F3>) {
chomp ($line3);
my ($ID3, $count3)=split("\t", $line3);
$key=$ID3;
if (exists($combinedfile{$key}
{ $combinedfile{$key}.="\n$count3";}
else {
$combinedfile{$key}="0\n0\n$count3";
}
close (F3);
foreach (my $member (keys %combinedfile)){
split ("/n", $combinedfile{$member));
print $member, "\t", (join("\t", split ("/n", $combinedfile{$member)), "\n";
}
The output is:
ID File File2 File3
A 1 5 2
B 0 10 6
C 3 15
D 4
M 6
O 0 0 20
X 0 0 9
Z 0 26
I know there is a bug with the algorithm, e.g. D in File1, when reading File2, D is supposed to be saved as:
D 4\n0
and when reading File3, it should be saved as:
D 4\n0\n0
But it was skipped because it is not in File2 or File3. The fact seems only the new "KEY" of the hash is properly added, and the existing KEY not listed in latter files (File2 or File3) will be skipped.
How to fix this bug? I met in my work occasionally, and seems a common job similar to join but different. Hope there is command like "union" for this job (leave all the 0 with NA!, my wish though!)
Thanks a lot in advance!
Yifang