Restricted File Comparison

Hey guys,

I've got a scripting problem that has been bugging me so thought I'd ask the wise people here! Basically I have two overlapping log files, and I want to get the newest lines from the new log file that aren't in the old log file - but not the old lines in the old log that aren't in the new log.

Say the old log file has this content:

1: Apple
2: Banana
3: Carrot
4: Dog

And the new log has this content:

3: Carrot
4: Dog
5: Elephant
6: Fish

I want my script to output:

5: Elephant
6: Fish

Essentially, I want a diff of the two files, but only include the lines that are unique to the second file - is this possible?

edit: It seems that I can do it with "grep -v -x -f file1 file2", but this has O(n^2) complexity so won't work nearly fast enough with files that are reasonably long. Any other ideas?

Thanks for any help!
Giancarlo

  1. If awk can get the first file in its memory you can do something like:
awk '
NR == FNR {a[$2] = $1 }
NR != FNR && !($2 in a)' old new
  1. If you don't mind about the order of the input you can use sort and uniq on both files.