Compare columns of multiple files and print those unique string from File1 in an output file.

Hi,

I have multiple files that each contain one column of strings:

File1:

123abc
456def
789ghi

File2:

123abc
456def
891jkl

File3:

234mno
123abc
456def

In total I have 25 of these type of file.

I want to compare the strings in File 1 with the 24 other files and Print in my output ONLY those strings in File 1 that do not appear in ANY other file.

Can anyone help me?
Thanks!

Could this help ?

awk 'NR==FNR{a[$1]++;next}{ if ( $i in a){a[$1]="Y"}} END{ for (i in a){if (a != "Y"){print i}}} ' file1 file2 file3

How about this:

comm -23 <(sort file1) <(sort file2 file3 file4)

A slight simplification of pravin27's script:

awk '
FNR == NR { l[$0]; next }
$0 in l { delete l[$0] }
END { for(i in l) print i } 
' File*

As always, if you want to try this on a Solaris/SunOS system, use /usr/xpg4/bin/aw k, /usr/xpg6/bin/awk , or nawk instead of just awk .

2 Likes

Thank you for all of the responses. Don Cragun, your simplification of Pravin's command line works great. Simple and effective. Thanks again!

Hi.

Using grep, assuming that the 24 files will fit into memory available:

#!/usr/bin/env bash

# @(#) s1	Demonstrate inverse, "-v", match, grep with auxiliary file.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C grep

pl " Input data file primary data*:"
head primary data*

pl " Results:"
grep -v -f <( cat data* ) primary

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
grep GNU grep 2.5.3

-----
 Input data file primary data*:
==> primary <==
123abc
456def
789ghi

==> data1 <==
123abc
456def
891jkl

==> data2 <==
234mno
123abc
456def

-----
 Results:
789ghi

See man pages for details.

Best wishes ... cheers, drl