Compare files

Hi Masters! I know this problem is quite difficult.
I have two files that looked like this:

File1
mary a b d
anne d e
jane g h
sam a

File2
role1 a d c d
role2 e f g h
role3 a b d
role4 a d e
role5 g h

It will first look into file1 then compare to all entries in file2 regardless of arrangement.

The output should be:
mary role1 role3
anne role 4
jane role2 role5
sam role1 role3 role4

I've tried some other way to do this but unsuccessful. Can it be done in UNIX? Please help. Thanks!

This looks a lot like homework, so I will not give an answer.

Post what you have tried already and maybe someone will point out where you are going wrong.

It's not a homework. Actually I tried using excel to manipulate the files but I'm not familiar with macros so I hope someone can help me do it in UNIX.

This will be a repeated process for me and having a right code will really help me a lot. Unfortunately my limited knowledge in UNIX wont help me too.

What are the criteria? This script looks for all lines in File2 that contain any of the letters after the name, but the ouput doesn't match yours:

while read name a b c d e
do
  [ -n "$a$b$c$d$e" ] &&
  result=$( grep ${a:+-e " $a"} ${b:+-e " $b"} ${c:+-e " $c"} \
             ${d:+-e " $d"} ${e:+-e " $e"} File2 | cut -d ' ' -f1)
  set -- $result
  echo "$name" $result
done < File1

The files on file1 can go over several rows and columns as well as for file2. For example, mary has a, b and d. It will then search the file2 that contains a, b and d (it should be all not any) which in this case role1 and role3. Then it will begin searching for second entry in file1 who is anne, and so on.

#! /opt/third-party/bin/perl

open(FILE, "<", b) || die ("Unable to open file. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
    $dump .= $split_arr[$i];
  }
  $fileHash{$split_arr[0]} = $dump;
}

close(FILE);

open(FILE, "<", a) || die ("Unable to open file. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
    $dump .= $split_arr[$i];
  }
  print "$split_arr[0] ";
  foreach my $key ( keys %fileHash ) {
    if( $fileHash{$key} =~ m/$dump/ ) {
      print "$key ";
    }
  }
  print "\n";
}

close(FILE);

exit 0

I have onething to be clarified,

from the examples provided,
for mary only role3 would match
and not
both role3 and role1

Could you please check and confirm that ? :slight_smile:

Yes, you're right :slight_smile:
Can you tell me where in the code is the first file and the second file?

Oops!

I should have made it clear!

#! /opt/third-party/bin/perl

open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
    $dump .= $split_arr[$i];
  }
  $fileHash{$split_arr[0]} = $dump;
}

close(FILE);

open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
    $dump .= $split_arr[$i];
  }
  print "$split_arr[0] ";
  foreach my $key ( keys %fileHash ) {
    if( $fileHash{$key} =~ m/$dump/ ) {
      print "$key ";
    }
  }
  print "\n";
}

close(FILE);

exit 0

Wow! You're a genius! It worked! It will save me lots of time in doing my work. :slight_smile:
Thanks a lot!

matrixmadhan -
I find some minor problem on the code. It should search for the exact file pattern.

In my example below:

record1
mary MI_AP
anne MI_RC

record2
role1 MI_AP_REC
role2 MI_AP MI_RC

output of the current code:
mary role1 role2
anne role2

Is it possible that it should only search for exact word so the output will be (below) ..?

mary role2
anne role2

if it has to match exact string change the above to

 if( $fileHash{$key} =~ $dump ) {

Try that! :slight_smile:

It's not working :frowning:

run the below as such and let us know the results

I have modified the code

#! /opt/third-party/bin/perl

open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i <= $#split_arr; $i++ ) {
    $dump .= ( $split_arr[$i] . ":");
  }
  $dump =~ s/:$//;
  $fileHash{$split_arr[0]} = $dump;
}

close(FILE);

open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
    $dump .= $split_arr[$i];
  }
  print "$split_arr[0] ";
  foreach my $key ( keys %fileHash ) {
    @diff_arr = split(/:/, $fileHash{$key});
    for( my $i = 0; $i <= $#diff_arr; $i++ ) {
      if( $dump =~ $diff_arr[$i] ) {
        print "$key ";
      }
    }
  }
  print "\n";
}

close(FILE);

exit 0

Yes it works!!! Thank you so much. :slight_smile:

Here's a Python alternative:

for line1 in open("file1"):
        line1 = line1.strip().split(" ",1)
        f1col = line1[1:][0].split()
        print
        print line1[0],
        for line2 in open("file2"):
                count =0
                line2 = line2.strip().split(" " ,1)
                for item1 in f1col :
                        for item2 in line2[1:][0].split():
                                if item1 == item2 : count+=1
                if count == len(f1col): print line2[0],

output:

# ./test.py

mary role1 role3
anne role4
jane role2 role5
sam role1 role3 role4

and

# ./test.py

mary role2
anne role2

What if I want to compare the files the other way around? I can reverse the two files but it will give too many fields output based from my current file and i still have to further arrange the data to get my desired output (almost 23,000 rows for file1 record). It should be all values from file2 that is present in file1. (previous request was for all values of files in file2). Both were needed to get the desired output for my records.

File1
mary a b c d
anne e f g h
jane a d e
sam g h

File2
role1 a b
role2 a b c
role3 g h
role4 a e
role5 e f g

Output
mary role1 role2
anne role3 role5
jane role4
sam role3

Would appreciate it if the code is in korn or perl. Thanks in advance...

tested and it works fine! :slight_smile:

Try this!

#! /opt/third-party/bin/perl

open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i <= $#split_arr; $i++ ) {
    $dump .= ( $split_arr[$i] . ":");
  }
  $dump =~ s/:$//;
  $fileHash{$split_arr[0]} = $dump;
}

close(FILE);

open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @first_arr = split(/ /, $_);
  print "$first_arr[0] ";
  foreach my $key ( keys %fileHash ) {
    @second_arr = split(/:/, $fileHash{$key});
    for($i = 0; $i <= $#second_arr; $i++ ) {
      $set = 0;
      for( $j = 1; $j <= $#first_arr; $j++ ) {
        if( $first_arr[$j] =~ $second_arr[$i] ) {
          $set = 1;
          last;
        }
      }
      last if( $set == 0 )
    }
    print "$key " if( $set == 1 )
  }
  print "\n";
}

close(FILE);

exit 0

This one works. Tested :slight_smile:

I found some problem with this one (the reverse of the other). The output repeats.
File1
mary MI_AP MI_RC
anne MI_RC

File2
role1 MI_AP_REC
role2 MI_AP MI_RC

Output of this code:
mary role2 role2
anne role2

Needed output:
mary role2
anne role2

Am i giving you too much problem :frowning: ? This one (perl) is really new to me.

 Code:
#! /opt/third-party/bin/perl

open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i <= $#split_arr; $i++ ) {
    $dump .= ( $split_arr[$i] . ":");
  }
  $dump =~ s/:$//;
  $fileHash{$split_arr[0]} = $dump;
}

close(FILE);

open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
    $dump .= $split_arr[$i];
  }
  print "$split_arr[0] ";
  foreach my $key ( keys %fileHash ) {
    @diff_arr = split(/:/, $fileHash{$key});
    for( my $i = 0; $i <= $#diff_arr; $i++ ) {
      if( $dump =~ $diff_arr[$i] ) {
        print "$key ";
      }
    }
  }
  print "\n";
}

close(FILE);

Sorry bout the confusing username above, i forgot my friend was logged in in my PC and i forgot to change user before replying..