Hi,
I'm not a regular coder but some times I write some basic perl script, hence Perl is bit difficult for me :).
I'm merging two files a.txt and b.txt into c.txt:
a.txt
------
x001;frtb70;xyz;109
x001;frvt65;sec;239
x003;wqax34;jul;659
x004;yhud43;yhn;760
b.txt
------
x001;abcd80;xyz;193
x001;crrp28;xse;456
x002;lmno10;xyz;784
x002;jfds65;jfd;739
x002;juop88;jup;879
x003;yulo90;rem;542
x003;kihl98;dnt;312
x004;urel25;ewb;342
c.txt [output]
------
x001;frtb70;xyz;109
x001;frvt65;sec;239
x002;lmno10;xyz;784
x002;jfds65;jfd;739
x003;wqax34;jul;659
x004;yhud43;yhn;760
[/COLOR]
Only condition is: I need all the lines from a.txt into c.txt.
But while selecting lines from b.txt into c.txt, first I need to look into a.txt. If the line is already present in a.txt, then I shouldn't consider that b.txt line while writing into c.txt [output]. In all the files, we can consider first column as key, but it may contain duplicates. That is becoming challenge for me.
Below are the script I've writen. problem is, as I'm using hash for both input files, its not considering the lines which has same key value. But I should use all a.txt eventhough keys are same. Same is true for b.txt, except it should skip the lines, if the key is already present in a.txt.
#!/usr/bin/env perl
sub prepareHash {
#my ($in_file, $primary_Key, $delimiter) = @_;
my $in_file = shift;
my $key = shift;
my $delimiter = shift;
my @line_tokens;
my %FILE_Hash;
open( IN_FILE, "< $in_file" ) or die "Can't open $in_file : $!";
while (<IN_FILE>) {
my $in_line = $_;
chomp($in_line);
@line_tokens = split(/$delimiter/, $in_line);
$FILE_Hash{$line_tokens[$key]} = $in_line;
}
close IN_FILE;
return %FILE_Hash;
}
my $input1 = "/export/home/a.txt";
my $input2 = "/export/home/b.txt";
my $output = "/export/home/c.txt";
my %A_Hash = prepareHash($input1, 0 , ";" );
my %B_Hash = prepareHash($input2, 0 , ";" );
open( OUT_FILE, "> $c.txt" ) or die "Can't open $c.txt : $!";
for my $a_key ( sort keys %A_Hash ) {
$a_key =~ s/\s+$//;
my $a_line = $A_Hash{$a_key};
print OUT_FILE $a_line . "\n";
}
# Compare OBL and REPOOBL. Only write extra REPOOBL lines which are not in OBL into BOND file
for my $b_key ( sort keys %B_Hash ) {
$b_key =~ s/\s+$//;
if ( ! exists $A_Hash{$b_key} ) {
my $b_line = $B_Hash{$b_key};
print OUT_FILE $b_line . "\n";
} else {
print "$B_Hash{$b_key} is the already writen into c.txt using a.txt, hence skipping\n";
}
}
close OUT_FILE;
Can any of you help me please?