Compare lists of files

If I had a list of numbers in two different files, what would be the fastest and easiest way to find out which numbers in list B are not in list A without reading each number in list B one at a time and using grep thousands of times against list A?

I have two very long lists of numbers and the grep routine is too slow and uses too many resources.

If anyone has any good ideas, I'd sure appreciate it.

Check out the cmp or diff commands.

If your files are suppose to be the same but are off, then these commands will help.

Example:
file A file B
1 1
2 2
3 4
4 5
5 6

% diff file1 file2
3d2
< 3
5a5
> 6
% cmp file1 file2
file1 file2 differ: char 5, line 3

If I'm not mistaken, diff and cmp do line by line comparisons. My problem is that one file has 10,000 lines and the other about 1,000. Both diff and cmp are going to show many lines being different but not necessarily show that one number does not exist in the other file. That number may in fact exist but on a different line.

I don't think this would work in this particular case.

Have a read about 'comm'

This will take two files, and provide output (regardless of sizes) that tell you what values are unique to file1, unique to file 2, or common to both. You can specify one, two or all of these outputs to be present.

You may have to sort the files before you use comm....just check the man page for requirements against how your files are ordered.

Comm did it!

Thank you