Compare files

kharen11 · March 10, 2007, 10:04am

Hi Masters! I know this problem is quite difficult.
I have two files that looked like this:

File1
mary a b d
anne d e
jane g h
sam a

File2
role1 a d c d
role2 e f g h
role3 a b d
role4 a d e
role5 g h

It will first look into file1 then compare to all entries in file2 regardless of arrangement.

The output should be:
mary role1 role3
anne role 4
jane role2 role5
sam role1 role3 role4

I've tried some other way to do this but unsuccessful. Can it be done in UNIX? Please help. Thanks!

reborg · March 10, 2007, 10:45am

This looks a lot like homework, so I will not give an answer.

Post what you have tried already and maybe someone will point out where you are going wrong.

kharen11 · March 10, 2007, 11:04am

It's not a homework. Actually I tried using excel to manipulate the files but I'm not familiar with macros so I hope someone can help me do it in UNIX.

This will be a repeated process for me and having a right code will really help me a lot. Unfortunately my limited knowledge in UNIX wont help me too.

cfajohnson · March 10, 2007, 3:08pm

What are the criteria? This script looks for all lines in File2 that contain any of the letters after the name, but the ouput doesn't match yours:

while read name a b c d e
do
  [ -n "$a$b$c$d$e" ] &&
  result=$( grep ${a:+-e " $a"} ${b:+-e " $b"} ${c:+-e " $c"} \
             ${d:+-e " $d"} ${e:+-e " $e"} File2 | cut -d ' ' -f1)
  set -- $result
  echo "$name" $result
done < File1

kharen11 · March 10, 2007, 10:00pm

The files on file1 can go over several rows and columns as well as for file2. For example, mary has a, b and d. It will then search the file2 that contains a, b and d (it should be all not any) which in this case role1 and role3. Then it will begin searching for second entry in file1 who is anne, and so on.

matrixmadhan · March 10, 2007, 11:49pm

#! /opt/third-party/bin/perl

open(FILE, "<", b) || die ("Unable to open file. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
    $dump .= $split_arr[$i];
  }
  $fileHash{$split_arr[0]} = $dump;
}

close(FILE);

open(FILE, "<", a) || die ("Unable to open file. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
    $dump .= $split_arr[$i];
  }
  print "$split_arr[0] ";
  foreach my $key ( keys %fileHash ) {
    if( $fileHash{$key} =~ m/$dump/ ) {
      print "$key ";
    }
  }
  print "\n";
}

close(FILE);

exit 0

I have onething to be clarified,

from the examples provided,
for mary only role3 would match
and not
both role3 and role1

Could you please check and confirm that ?

kharen11 · March 11, 2007, 12:02am

matrixmadhan:

#! /opt/third-party/bin/perl

open(FILE, "<", b) || die ("Unable to open file. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
   $dump .= $split_arr[$i];
  }
  $fileHash{$split_arr[0]} = $dump;
}

close(FILE);

open(FILE, "<", a) || die ("Unable to open file. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
   $dump .= $split_arr[$i];
  }
  print "$split_arr[0] ";
  foreach my $key ( keys %fileHash ) {
   if( $fileHash{$key} =~ m/$dump/ ) {
   print "$key ";
   }
  }
  print "\n";
}

close(FILE);

exit 0

I have onething to be clarified,

from the examples provided,
for mary only role3 would match
and not
both role3 and role1

Could you please check and confirm that ?

Yes, you're right
Can you tell me where in the code is the first file and the second file?

matrixmadhan · March 11, 2007, 12:11am

Oops!

I should have made it clear!

#! /opt/third-party/bin/perl

open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
    $dump .= $split_arr[$i];
  }
  $fileHash{$split_arr[0]} = $dump;
}

close(FILE);

open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
    $dump .= $split_arr[$i];
  }
  print "$split_arr[0] ";
  foreach my $key ( keys %fileHash ) {
    if( $fileHash{$key} =~ m/$dump/ ) {
      print "$key ";
    }
  }
  print "\n";
}

close(FILE);

exit 0

kharen11 · March 11, 2007, 12:31am

Wow! You're a genius! It worked! It will save me lots of time in doing my work.
Thanks a lot!

kharen11 · March 13, 2007, 12:45am

matrixmadhan:

Oops!

I should have made it clear!

#! /opt/third-party/bin/perl

open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
   $dump .= $split_arr[$i];
  }
  $fileHash{$split_arr[0]} = $dump;
}

close(FILE);

open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
   $dump .= $split_arr[$i];
  }
  print "$split_arr[0] ";
  foreach my $key ( keys %fileHash ) {
   if( $fileHash{$key} =~ m/$dump/ ) {
   print "$key ";
   }
  }
  print "\n";
}

close(FILE);

exit 0

matrixmadhan -
I find some minor problem on the code. It should search for the exact file pattern.

In my example below:

record1
mary MI_AP
anne MI_RC

record2
role1 MI_AP_REC
role2 MI_AP MI_RC

output of the current code:
mary role1 role2
anne role2

Is it possible that it should only search for exact word so the output will be (below) ..?

mary role2
anne role2

matrixmadhan · March 13, 2007, 12:48am

if it has to match exact string change the above to

 if( $fileHash{$key} =~ $dump ) {

Try that!

kharen11 · March 13, 2007, 1:24am

It's not working

matrixmadhan · March 13, 2007, 1:51am

run the below as such and let us know the results

I have modified the code

#! /opt/third-party/bin/perl

open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i <= $#split_arr; $i++ ) {
    $dump .= ( $split_arr[$i] . ":");
  }
  $dump =~ s/:$//;
  $fileHash{$split_arr[0]} = $dump;
}

close(FILE);

open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
    $dump .= $split_arr[$i];
  }
  print "$split_arr[0] ";
  foreach my $key ( keys %fileHash ) {
    @diff_arr = split(/:/, $fileHash{$key});
    for( my $i = 0; $i <= $#diff_arr; $i++ ) {
      if( $dump =~ $diff_arr[$i] ) {
        print "$key ";
      }
    }
  }
  print "\n";
}

close(FILE);

exit 0

kharen11 · March 13, 2007, 2:02am

Yes it works!!! Thank you so much.

ghostdog74 · March 13, 2007, 3:20am

Here's a Python alternative:

for line1 in open("file1"):
        line1 = line1.strip().split(" ",1)
        f1col = line1[1:][0].split()
        print
        print line1[0],
        for line2 in open("file2"):
                count =0
                line2 = line2.strip().split(" " ,1)
                for item1 in f1col :
                        for item2 in line2[1:][0].split():
                                if item1 == item2 : count+=1
                if count == len(f1col): print line2[0],

output:

# ./test.py

mary role1 role3
anne role4
jane role2 role5
sam role1 role3 role4

and

# ./test.py

mary role2
anne role2

kharen11 · March 13, 2007, 4:04am

What if I want to compare the files the other way around? I can reverse the two files but it will give too many fields output based from my current file and i still have to further arrange the data to get my desired output (almost 23,000 rows for file1 record). It should be all values from file2 that is present in file1. (previous request was for all values of files in file2). Both were needed to get the desired output for my records.

File1
mary a b c d
anne e f g h
jane a d e
sam g h

File2
role1 a b
role2 a b c
role3 g h
role4 a e
role5 e f g

Output
mary role1 role2
anne role3 role5
jane role4
sam role3

Would appreciate it if the code is in korn or perl. Thanks in advance...

matrixmadhan · March 13, 2007, 7:15am

tested and it works fine!

Try this!

#! /opt/third-party/bin/perl

open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i <= $#split_arr; $i++ ) {
    $dump .= ( $split_arr[$i] . ":");
  }
  $dump =~ s/:$//;
  $fileHash{$split_arr[0]} = $dump;
}

close(FILE);

open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @first_arr = split(/ /, $_);
  print "$first_arr[0] ";
  foreach my $key ( keys %fileHash ) {
    @second_arr = split(/:/, $fileHash{$key});
    for($i = 0; $i <= $#second_arr; $i++ ) {
      $set = 0;
      for( $j = 1; $j <= $#first_arr; $j++ ) {
        if( $first_arr[$j] =~ $second_arr[$i] ) {
          $set = 1;
          last;
        }
      }
      last if( $set == 0 )
    }
    print "$key " if( $set == 1 )
  }
  print "\n";
}

close(FILE);

exit 0

The_One · March 13, 2007, 7:45am

matrixmadhan:

tested and it works fine!

Try this!

#! /opt/third-party/bin/perl

open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i <= $#split_arr; $i++ ) {
   $dump .= ( $split_arr[$i] . ":");
  }
  $dump =~ s/:$//;
  $fileHash{$split_arr[0]} = $dump;
}

close(FILE);

open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @first_arr = split(/ /, $_);
  print "$first_arr[0] ";
  foreach my $key ( keys %fileHash ) {
   @second_arr = split(/:/, $fileHash{$key});
   for($i = 0; $i <= $#second_arr; $i++ ) {
   $set = 0;
   for( $j = 1; $j <= $#first_arr; $j++ ) {
   if( $first_arr[$j] =~ $second_arr[$i] ) {
   $set = 1;
   last;
   }
   }
   last if( $set == 0 )
   }
   print "$key " if( $set == 1 )
  }
  print "\n";
}

close(FILE);

exit 0

This one works. Tested

The_One · March 13, 2007, 7:53am

matrixmadhan:

run the below as such and let us know the results

I have modified the code

#! /opt/third-party/bin/perl

open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i <= $#split_arr; $i++ ) {
   $dump .= ( $split_arr[$i] . ":");
  }
  $dump =~ s/:$//;
  $fileHash{$split_arr[0]} = $dump;
}

close(FILE);

open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
   $dump .= $split_arr[$i];
  }
  print "$split_arr[0] ";
  foreach my $key ( keys %fileHash ) {
   @diff_arr = split(/:/, $fileHash{$key});
   for( my $i = 0; $i <= $#diff_arr; $i++ ) {
   if( $dump =~ $diff_arr[$i] ) {
   print "$key ";
   }
   }
  }
  print "\n";
}

close(FILE);

exit 0

I found some problem with this one (the reverse of the other). The output repeats.
File1
mary MI_AP MI_RC
anne MI_RC

File2
role1 MI_AP_REC
role2 MI_AP MI_RC

Output of this code:
mary role2 role2
anne role2

Needed output:
mary role2
anne role2

Am i giving you too much problem ? This one (perl) is really new to me.

kharen11 · March 13, 2007, 8:09am

 Code:
#! /opt/third-party/bin/perl

open(FILE, "<", secondfile) || die ("Unable to open secondfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i <= $#split_arr; $i++ ) {
    $dump .= ( $split_arr[$i] . ":");
  }
  $dump =~ s/:$//;
  $fileHash{$split_arr[0]} = $dump;
}

close(FILE);

open(FILE, "<", firstfile) || die ("Unable to open firstfile. <$!>\n");

while( <FILE> ) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i < $#split_arr + 1; $i++ ) {
    $dump .= $split_arr[$i];
  }
  print "$split_arr[0] ";
  foreach my $key ( keys %fileHash ) {
    @diff_arr = split(/:/, $fileHash{$key});
    for( my $i = 0; $i <= $#diff_arr; $i++ ) {
      if( $dump =~ $diff_arr[$i] ) {
        print "$key ";
      }
    }
  }
  print "\n";
}

close(FILE);

Sorry bout the confusing username above, i forgot my friend was logged in in my PC and i forgot to change user before replying..