Compare 2 files and print matches and non-matches in separate files

AshwaniSharma09 · March 9, 2013, 4:32pm

Hi all,

I have two files, chap.txt and complex.txt.

chap.txt looks like this:

a
d
l
m
r
k

complex.txt looks like this:

a c d e l m n j
a d l p q r
c p r m
.........
.........

What I need is to search for all lines (single column) of chap.txt into each line of complex.txt and generate separate files for matching and non-matching strings, like this:

C1match.txt       C1non_match.txt
a                           c
d                           e
l                           n
m                           j

Similarly, 2 files for the next line:
C2match.txt       C2non_match.txt
a                           p
d                           q
l
r

If there is no match in a line of complex.txt and lines of chap.txt, there will be no output files in such cases. So, if total number of lines in the complex.txt is N and there is always a match (which won't be actually), total possible output files would be Nx2.

I hope, I explained my problem clearly.

Thanks and Regards

Yoda · March 9, 2013, 5:20pm

awk ' BEGIN {
        c = 1
} FNR == NR {
        A[$0] = $0
        next
} {
        M_F = "C" c "match.txt"
        N_F = "C" c "non_match.txt"

        for ( j = 1; j <= NF; j++ )
        {
                if ( $j in A)
                        print $j > M_F
                else
                        print $j > N_F
        }
        c += 1
        close(M_F)
        close(N_F)
} ' chap.txt complex.txt

AshwaniSharma09 · March 9, 2013, 5:59pm

Thanks bipinajith. It just that its printing non_match files for those combinations also where there is no match.

Also, Could you please help me in making files having combination pairs of non_match with match files. For example:

C1match.txt
a
d
l
m

C2non_match.txt
p
q

C1matchC2non_match.txt
a,p
a,q
d,p
d,q
l,p
l,q
m,p
m,q

I really appreciate your time and help.

Thanks

Yoda · March 9, 2013, 6:26pm

I didn't quite understand this statement.

Also I don't really understand why are you want to merge C1 with C2?

Anyway, here is how it can be done:

awk ' FNR == NR {
        A[++j] = $0
        next
} {
        B[$0] = $0
} END {
        for ( i = 1; i <= j; i++)
        {
                for ( v in B )
                {
                        print A, B[v];
                }
        }
} ' OFS=, C1match.txt C2non_match.txt

RudiC · March 10, 2013, 5:19am

For your first request, you might want to give this a shot:

awk     'NR==1          {Pat = "["}
         NR==FNR        {Pat = Pat $1;next}
         FNR==1         {Pat=Pat"]"}
         {for (i=1; i<=NF; i++)
            if ($i~Pat) print $i > "C"FNR"match.txt"
              else      print $i > "C"FNR"nonmatch.txt" }
        ' file1 file2

pravin27 · March 11, 2013, 4:14am

How about this ?

 
#!/usr/bin/perl
use strict;
my $file1=shift;
my $file2=shift;
my %seen;
open (FH,"$file1") or die "can not open file $file1 - $! \n";
while (<FH>) {
        chomp;
        $seen{$_}++;
}
close(FH);
my @flds;
open (FH2,"$file2") or die "can not open file $file2  - $! \n";
while (<FH2>) {
        my $match="C" . $. . "match.txt";
        my $unmatch="C" . $. . "unmatch.txt";
        open (FM,">$match");
        open (FU,">$unmatch");
        @flds=split;
        foreach (@flds) {
        if (exists $seen{$_} ) {
                print FM "$_\n";
        } else {
                print FU "$_\n";
        }
        }
        close(FM);
        close(FU);
}
close(FH2);

Invocation

 
perl colmatch.pl chap.txt complex.txt

AshwaniSharma09 · March 12, 2013, 9:37am

Thanks Guys. pravin27s perl script runs fine too. I could see some problem with RudiC's scipt though. Thanks once again.

pamu · March 12, 2013, 9:50am

try

awk 'NR==FNR{A[$1]++;next}
    {for(i=1;i<=NF;i++){
    if(A[$i]){    print $i > "C"FNR"match.txt"}
    else{        print $i > "C"FNR"non_match.txt"}}}
    ' chap.txt complex.txt