Compare two files

my_Perl · July 31, 2009, 5:18am

Hi

I want to compare two files which looks like this

File1: -

STRING LABEL

aabbaa A
basda B
ccdaa C
xxdas D
..
..
hgfhgfd P
asfasdf Z

File2: -

STRING LABEL

xxdas D
aabbaa A
dfsfafdf Z
fasdfas X
asfasdf Z
....
....

By picking one line from file 1 (both string and label) check for it whether it is in file 2 or not. If not found in file 2, count the total number of such strings of file 1. How do I write a script in Perl? Well, the strings are in utf8.

I've tried with the following code.

  \#!/usr/bin/perl

    use warnings;
    use strict;

    \# Sort strings properly and match all letters
    use locale;

    \# Read the source file as UTF-8 and set STDOUT and STDIN to UTF-8
    use encoding 'utf8';

open my $FILE1, "&lt;:encoding\(utf8\)", "$ARGV[0]" or die "Can't open file $ARGV[0]: $!";

open my $FILE2, "&lt;:encoding\(utf8\)", "$ARGV[1]" or die "Can't write to $ARGV[1]: $!";
     

my %line1;
my %line2;

my $string1;
my $string2;
my $label1;
my $label2;

    my $count; 

while \(my $line1 = &lt;$FILE1&gt;\) 
\{

    while \(my $line2 = &lt;$FILE2&gt;\)
    \{

    while \(\($string1, $label1\) = each \(%line1\)\)
    \{
        
    while \(\($string2, $label2\) =each \(%line2\)\)
        \{
        if \(\($string1 ne $string2\) && \( $label1 ne $label2\)\)        
        \{
            $count\+\+;
        \}                
                    
        \}
        
    \}

\}
print $count;

}

But it's in vain. Any help is appreciated.

ranjithpr · July 31, 2009, 5:24am

If the files are not too big you can use

grep -vcf File2 File1

my_Perl · July 31, 2009, 6:04am

I have a big data in the files. So, what can be done?

ranjithpr · July 31, 2009, 6:37am

Try this...

awk 'FNR==NR{_[$1$2]="T";next} FNR!=NR && _[$1$2]!="T"{count++} END {print count}' File2 File1

RohitKJ · July 31, 2009, 6:41am

reading 1 file at a time will work.

%file1 = ();
open my $FILE1, "<:encoding(utf8)", "$ARGV[0]" or die "Can't open file $ARGV[0]: $!";
while (my $line1 = <$FILE1>)
{
$file1{$line1} = "1";
}

open my $FILE2, "<:encoding(utf8)", "$ARGV[1]" or die "Can't write to $ARGV[1]: $!";
while (my $line2 = <$FILE2>){
if(! exists $file1{$line2}{
$count++;
}
}

my_Perl · July 31, 2009, 6:49am

I tried with
grep -vcf File2 File1.
I got some result which is reliable.
And the second one,

awk 'FNR==NR{_[$1$2]="T";next} FNR!=NR && _[$1$2]!="T"{count++} END {print count}' File2 File1 is also tried. But I am getting some other result which I doubt.

A simple modification in the problem is, instead of all the labels, I select some and check the corresponding strings from file 1, whether it is in file 2.

Thanks for the codes.