Hi,
I have multiple files that each contain one column of strings:
File1:
123abc
456def
789ghi
File2:
123abc
456def
891jkl
File3:
234mno
123abc
456def
In total I have 25 of these type of file.
I want to compare the strings in File 1 with the 24 other files and Print in my output ONLY those strings in File 1 that do not appear in ANY other file.
Can anyone help me?
Thanks!
Could this help ?
awk 'NR==FNR{a[$1]++;next}{ if ( $i in a){a[$1]="Y"}} END{ for (i in a){if (a != "Y"){print i}}} ' file1 file2 file3
Subbeh
October 15, 2013, 7:26am
3
How about this:
comm -23 <(sort file1) <(sort file2 file3 file4)
A slight simplification of pravin27's script:
awk '
FNR == NR { l[$0]; next }
$0 in l { delete l[$0] }
END { for(i in l) print i }
' File*
As always, if you want to try this on a Solaris/SunOS system, use /usr/xpg4/bin/aw
k, /usr/xpg6/bin/awk
, or nawk
instead of just awk
.
2 Likes
Thank you for all of the responses. Don Cragun, your simplification of Pravin's command line works great. Simple and effective. Thanks again!
drl
October 15, 2013, 10:51pm
6
Hi.
Using grep, assuming that the 24 files will fit into memory available:
#!/usr/bin/env bash
# @(#) s1 Demonstrate inverse, "-v", match, grep with auxiliary file.
# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C grep
pl " Input data file primary data*:"
head primary data*
pl " Results:"
grep -v -f <( cat data* ) primary
exit 0
producing:
$ ./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian GNU/Linux 5.0.8 (lenny, workstation)
bash GNU bash 3.2.39
grep GNU grep 2.5.3
-----
Input data file primary data*:
==> primary <==
123abc
456def
789ghi
==> data1 <==
123abc
456def
891jkl
==> data2 <==
234mno
123abc
456def
-----
Results:
789ghi
See man pages for details.
Best wishes ... cheers, drl