Show the Difference between two files

The_One · March 11, 2007, 10:24pm

I have two files and I need to know the difference between each line. This will extend to thousand lines and manual works is really not really an option.
sample:

First File Second File
allan entry1 entry2 entry3 allan entry1 entry3
bob entry1 entry2 entry3 entry4 bob entry1 entry4

I want to output the difference only. Sample output:

allan entry2
bob entry2 entry3

I know that simple grep will not work here. UNIX is new to me and this is the only way i think this can be done.

dennis.jacob · March 11, 2007, 11:36pm

Hi,

First, I am not very clear about the way you explained the content of two files and the sample output.Very sorry abt that.

But you can do file comparison using "comm" or "diff" or "cmp" commands...

Thnx
Dennis

The_One · March 11, 2007, 11:58pm

Sorry about that. These are the files.

First File
allan entry1 entry2 entry3
bob entry1 entry2 entry3 entry4

Second File
allan entry1 entry3
bob entry1 entry4

Output:
allan entry2
bob entry2 entry3

Using diff and comm doesnt give me the result that I want. The result should only be the name and the the entry that cant be found on the second file. In my example, allan will have entry 2 since it is not on the second file.

Perderabo · March 12, 2007, 12:04am

Are the two files guaranteed to have the same number of lines? So if a bob line is in one file it is also in the other? Are the records in the same sequence in both files?

ghostdog74 · March 12, 2007, 12:14am

If you have Python and know the language, here's an alternative:

#!/usr/bin/python
for line in open("file2"):
    line = line.strip() #get rid of newlines
    name,entry = line.split(' ',1)
    for lin in open("file1"):
        lin = lin.strip()
        if lin.startswith(name):
                for e in entry.split():
                    lin = lin.replace(e , "")
                print "Output: ", lin

output:

/test # ./test.py
Output:  allan  entry2
Output:  bob  entry2 entry3

The_One · March 12, 2007, 1:00am

Yes, they have same number of lines and also on same sequece.

I dont have a python so I cant use the code given by ghostdog74.

anbu23 · March 12, 2007, 2:02am

awk ' BEGIN { while ( getline < "first_file" ) { arr[$1]=$0; } }
{ for( i = 2 ; i <= NF ; ++i ) 
	sub($i,"",arr[$1])  
  gsub("  +"," ",arr[$1])
  print arr[$1] 
} ' second_file

The_One · March 12, 2007, 2:16am

It's working! Thanks anbu23!

nani_ynm · March 12, 2007, 3:24am

use diff command
check details from manual by typing 'man diff'

usage:

diff file1 file2

this will give you the lines at which the two files differ]
if file extent to thousand lines you can use more operator... like

diff file1 file2 | more

this will stop at the end of the screen and you can go to next page by pressing space bar and go to next line by pressing enter key..

enjoy UNIX

matrixmadhan · March 12, 2007, 4:25am

in perl,

#! /opt/third-party/bin/perl

open(FILE, "<", "first") || die "Unable to open first. <$!>\n";

while(<FILE>) {
  chomp;
  @split_arr = split(/ /, $_);
  my $dump;
  for( my $i = 1; $i <= $#split_arr + 1; $i++ ) {
    $dump .= ($split_arr[$i] . " ");
  }
  $fileHash{$split_arr[0]} = $dump;
}

close(FILE);

open(FILE, "<", "second") || die "Unable to open second. <$!>\n";

while(<FILE>) {
  chomp;
  @split_arr = split(/ /, $_);
  if ( exists $fileHash{$split_arr[0]} ) {
    @new_arr = split(/ /, $fileHash{$split_arr[0]});

    print "$split_arr[0] ";
    for( $i = 0; $i <= $#new_arr; $i++ ) {
      for( $j = 1; $j <= $#split_arr; $j++ ) {
        if( $new_arr[$i] =~ $split_arr[$j] ) {
          last;
        }
      }
      if( $j > $#split_arr ) {
        print "$new_arr[$i] ";
      }
    }
  }
  print "\n";
}

close(FILE);

exit 0

makrami · April 30, 2008, 5:27am

I'm using this line to find the difference between two files

To find data that exists in the file1 not exists in file2
diff file1 file2| grep '<' | tr -d '< '

To find data that exists in the file2 not exists in file1
diff file1 file2| grep '>' | tr -d '> '
----
Ismail