Finding records NOT on another file

I have three files named ALL, MATCH, and DIFF. Match and diff have completely different records included in the "all" file, but the "all" file also has records not in either the Match or Diff files.

I know I can sort all three files together, one unique and one without that option to show which ones appear in two files by running diff, but how can I find the records that are only in the "all" file?

TIA

If ALL is small enough to fit in memory:

awk 'NR==FNR { A[$0] ; next } ; $0 in A { delete A[$0] } END { for(X in A) { print X }' ALL MATCH DIFF
1 Like

Try also

sort ALL MATCH DIFF | uniq -c | grep "^ *1"

Sorted (untested):

comm -23 <(sort ALL) <(sort MATCH DIFF)

Unsorted (untested):

fgrep -f <(comm -23 <(sort ALL) <(sort MATCH DIFF) ALL)

You may wish to use the -u switch to sort to remove duplicate lines.

Andrew

One could also try:

awk 'FNR == 1 { fc++ } fc < 3 {d[$0]; next } !($0 in d)' DIFF MATCH ALL

which has been tested.

This requires enough space for the unique records in DIFF and MATCH to be held in memory, but doesn't require space in memory for the unique records in ALL .

The following variant works with any number of "exclude"-files

awk 'BEGIN {nfiles=ARGC-1} FNR == 1 { fc++ } fc < nfiles {d[$0]; next } !($0 in d)' DIFF MATCH ALL

Another idea: make the last filename special

awk 'FILENAME!="-" { d[$0]; next } !($0 in d)' MATCH DIFF - < ALL