Merge of two input file by search

Hi

i am running a issue with the way i handel open file in perl

i have the following input file <File1>

D33963|BNS Default Swap|-261564.923909249|
D24484|BNS Default Swap|-53356.6868058492|
D24485|BNS Default Swap|-21180.9904679111|
D33965|BNS Default Swap|154181.478745804|
D24486|BNS Default Swap|-47413.0013193052|
D33966|BNS Default Swap|-154181.478745804|
D24487|BNS Default Swap|-63253.9807711966|
D33968|BNS Default Swap|-160521.81007754|
D24489|BNS Default Swap|-10584.4665849774|
S85801|BNS Swap|451309.300774646|
D33969|BNS Default Swap|118166.419991555|

i would like too read the full file line by line, extract the first field (ex in the first case D33963)

search for that value in an other file <file2> that king of look like this
(it is a big file with me specific order)

:E00277,48089,,,Trading,FALSE,,CAISSE,19189,AA,CAD
:D24485,48085,,,Trading,FALSE,,CASSE,19139,AA,CAD
:D2448,48083,,,Trading,FALSE,,CAIE,19029,AA,CAD
:D33963,48082,,,Trading,FALSE,,CAISSE,19149,AA,CAD,
:E00286CAP,48082,,,Trading,FALSE,,CAISSE,19149,AA,CAD

then when i find the value in <file2> i would line the full line of <file1> and full line of file2 to be merge and writen in a output file

this is the code i come with, but for some raison it look like my <file2> is read only once, and start the second search from the point where it found the previous (so forgeting the begining of the file)

#!/bin/perl

open(F1,"<@ARGV[0]");
open(F2,"@ARGV[1]");
open (OUTFILE,">$ARGV[2]");

sub run () {
        while(<F1>)
        {
                chomp;
                $line = $_;
                @ARA_NAME = split(/\|/,$line);
                find_ara_name()

        }
close(F1);
close(F2);
}


sub find_ara_name() {
while (<F2>){
                chomp;
                $line1 = $_;
                        if ($line1 =~ m/:@ARA_NAME[0],/) {
                        print "found @ARA_NAME[0]\n";
                        $line1 =~ s/,/|/g;
                        print OUTFILE "$line$line1\n";
                        return;
                        }
                }
}



#main
run()

this is the output i would like for every record of <file1> if no match i will later write a other output file to track those

D33963|BNS Default Swap|-261564.923909249|:D33963|48082|||Trading|FALSE||CAISSE|19149|AA|CAD|

thanks for your help in advance

Hi kykyboss,

One way:

$ cat file1
D33963|BNS Default Swap|-261564.923909249|
D24484|BNS Default Swap|-53356.6868058492|
D24485|BNS Default Swap|-21180.9904679111|
D33965|BNS Default Swap|154181.478745804|
D24486|BNS Default Swap|-47413.0013193052|
D33966|BNS Default Swap|-154181.478745804|
D24487|BNS Default Swap|-63253.9807711966|
D33968|BNS Default Swap|-160521.81007754|
D24489|BNS Default Swap|-10584.4665849774|
S85801|BNS Swap|451309.300774646|
D33969|BNS Default Swap|118166.419991555|
$ cat file2
:E00277,48089,,,Trading,FALSE,,CAISSE,19189,AA,CAD
:D24485,48085,,,Trading,FALSE,,CASSE,19139,AA,CAD
:D2448,48083,,,Trading,FALSE,,CAIE,19029,AA,CAD
:D33963,48082,,,Trading,FALSE,,CAISSE,19149,AA,CAD,
:E00286CAP,48082,,,Trading,FALSE,,CAISSE,19149,AA,CAD
$ cat script.pl
use warnings;
use strict;

die qq[Usage: perl $0 <file1> <file2>\n] unless @ARGV == 2;

open my $fh1, "<", shift @ARGV or die qq[Error: Cannot open input file\n];
open my $fh2, "<", shift @ARGV or die qq[Error: Cannot open input file\n];

my (%file1);

while ( my $line = <$fh1> ) {
        chomp $line;
        my ($field1, $rest) = split /\|/, $line, 2;
        $file1{ $field1 } = $line;
}

while ( my $line = <$fh2> ) {
        chomp $line;
        for ( keys %file1 ) {
                if ( index( $line, $_ ) > -1 ) {
                        printf qq[%s%s\n], $file1{ $_ }, $line =~ s/,/|/gr;
                        last;
                }
        }

}
$ perl script.pl file1 file2
D24485|BNS Default Swap|-21180.9904679111|:D24485|48085|||Trading|FALSE||CASSE|19139|AA|CAD
D33963|BNS Default Swap|-261564.923909249|:D33963|48082|||Trading|FALSE||CAISSE|19149|AA|CAD|

Regards,
Birei

1 Like

Hi Birei

I found the folowing code that work but it take a long time as i have to open and close the <file2> all the time, is there a optimise way of doing it, maybe using more array?

#!/bin/perl

open(F1,"<@ARGV[0]");
#open(F2,"<@ARGV[1]");
open (OUTFILE,">$ARGV[2]");

sub run () {
        while(<F1>)
        {
                chomp;
                $line = $_;
                @ARA_NAME = split(/\|/,$line);
                find_ara_name()

        }
close(F1);
}


sub find_ara_name() {
open(F2,"<@ARGV[1]");
   foreach (<F2>){
                chomp;
                $line1 = $_;
                        if ($line1 =~ m/:@ARA_NAME[0],/) {
                        $line1 =~ s/,/|/g;
                        print OUTFILE "$line$line1\n";
                        return;
                        }
                close(F2);
                }
}

#main
run()

---------- Post updated at 07:03 PM ---------- Previous update was at 06:55 PM ----------

Hi
tanks a lot, i receive a compilation error with your script

Bareword found where operator expected at ./retest.pl line 22, near "s/,/|/gr"
syntax error at ./retest.pl line 22, near "s/,/|/gr"
Execution of ./retest.pl aborted due to compilation errors

Ok. It seems your 'perl' version doesn't support the 'r' switch in regex. Try this workaround with same result:

$ cat script.pl
use warnings;
use strict;

die qq[Usage: perl $0 <file1> <file2>\n] unless @ARGV == 2;

open my $fh1, "<", shift @ARGV or die qq[Error: Cannot open input file\n];
open my $fh2, "<", shift @ARGV or die qq[Error: Cannot open input file\n];

my (%file1);

while ( my $line = <$fh1> ) {
        chomp $line;
        my ($field1, $rest) = split /\|/, $line, 2;
        $file1{ $field1 } = $line;
}

while ( my $line = <$fh2> ) {
        chomp $line;
        for ( keys %file1 ) {
                if ( index( $line, $_ ) > -1 ) {
                        $line =~ tr/,/|/;
                        printf qq[%s%s\n], $file1{ $_ }, $line;
                        last;
                }
        }

}

Regards,
Birei

1 Like

hi

thanks again this time it work better, but the ouput is duplicated because i need to do the search for "

:D33963,

and not only D33963 (as they could duplicated) how can i modify this ?

---------- Post updated at 07:22 PM ---------- Previous update was at 07:20 PM ----------

E00284|BNS Default Swap|154181.478745804|:E00284|48082|||Trading|FALSE||CAISSE|19149|AA|CAD||||||||2919966|FALSE||||BNS|||E|CA|CA|OTH_CBR|Call|Asset Asian|8000000.00|408|OTC-EQUITY OPT-SLD-TRA-EX||Options|Derivatives|||48082||False||||Sell|||||||||||||||||||||||||||||
E00284|BNS Default Swap|154181.478745804|:E00284CAP|48082|||Trading|FALSE||CAISSE|19149|AA|CAD||||||||2919966|FALSE||||BNS|||E|CA|CA|OTH_CBR|Call|Asset Asian|11040000.00|408|OTC-EQUITY OPT-SLD-TRA-EX||Options|Derivatives|||48082||False||||Buy|||||||||||||||||||||||||||||
E00286|BNS Default Swap|-21180.9904679111|:E00286|48082|||Trading|FALSE||CAISSE|19149|AA|CAD||||||||2919966|FALSE||||BNS|||E|CA|CA|OTH_CBR|Call|Asset Asian|10000000.00|653|OTC-EQUITY OPT-SLD-TRA-EX||Options|Derivatives|||48082||False||||Sell|||||||||||||||||||||||||||||
E00286|BNS Default Swap|-21180.9904679111|:E00286CAP|48082|||Trading|FALSE||CAISSE|19149|AA|CAD||||||||2919966|FALSE||||BNS|||E|CA|CA|OTH_CBR|Call|Asset Asian|14500000.00|653|OTC-EQUITY OPT-SLD-TRA-EX||Options|Derivatives|||48082||False||||Buy|||||||||||||||||||||||||||||
E00287|BNS Default Swap|-261564.923909249|:E00287|48082|||Trading|FALSE||92767||Internal|CAD||||||||2920257|FALSE||||BNS|||E|CA|CA|OTH_CBR|Call|Asset Asian|1992000.00|647|||Options|Derivatives|||48082||False||||Sell|||||||||||||||||||||||||||||

---------- Post updated at 07:37 PM ---------- Previous update was at 07:22 PM ----------

ok cool i found it thnaks a lot

$file1{ ":$field1," } = $line;

---------- Post updated at 08:06 PM ---------- Previous update was at 07:37 PM ----------

one more question after i stop how can i create a second output that will copy recoord for <file1> that doesn't match in <File2> with your previous code

1.- Output is not duplicated, lines are different.

D24485|BNS Default Swap|-21180.9904679111|:D24485|48085|||Trading|FALSE||CASSE|19139|AA|CAD
D33963|BNS Default Swap|-261564.923909249|:D33963|48082|||Trading|FALSE||CAISSE|19149|AA|CAD|

2.- Do you want in file1 a search only for D33963?

3.- What would be that second output, can you provide an example?

Regards,
Birei

No it is all good for the output, i just look like to see if is possible to create a second output that will copy record from <file1> that doesn't match in <File2> with your previous code