Deleting lines of a file if they exist in another file

bjdamon · August 29, 2011, 11:47am

I have a reference file that needs to remain static and another file that may or may not have duplicate rows that match the reference file. I need help with a command that will delete any duplicate rows from the second file while leaving reference file intact

For example reference file would have a list of colors -
red
green
blue
yellow
purple

And the second file has the follwing list -
red
orange
purple

I would like to see the second file end up with only unique rows -
orange

any help with a statement would be greatly appreciated. Thanks

Shell_Life · August 29, 2011, 12:15pm

Is this what you are looking for:

sort -u Second_File

jim_mcnamara · August 29, 2011, 12:41pm

awk 'FILENAME="reference" {arr [$0]++}
       FILENAME=="second_file" { if( ! $0 in arr) {print $0} } 
      '   reference   second_file  | sort -u

You want a unique list of values in second file that are NOT in the reference file, coorect?

Shell_Life · August 29, 2011, 12:46pm

If you want unique records from the second file that are not in the reference file:

egrep -v -f Reference_File Second_File | sort -u

bjdamon · August 29, 2011, 3:25pm

Thanks all for your very timely responses! The last post was the first I tried and it worked beautifully!