I am trying to create script to remove same content from other file, already tested few idea and found that in unix it is limited to sort and uniq. There is many script for removing duplicate content however to delete all same content is non. Need your help and guide . thx
so here the situation
a.out a file with content that we need to delete in other file
more a.out
a
b
c
d
b.out a file that
more b.out
a
a
a
b
b
c
d
d
e
f
the result should be like c.out file
more c.out
e
f
note: the data is around 1.5 million list.
---------- Post updated at 03:55 PM ---------- Previous update was at 03:53 PM ----------
tested and not working and most only to delete duplication and im lost
In Solaris 10 you must use the Posix variants /usr/xpg4/bin/grep and /usr/xpg4/bin/awk. The ones in /usr/bin/ are from old Unix SysV 4.0.
Especially /usr/bin/awk is a link to /usr/bin/oawk (already AT&T said it's old and provided /usr/bin/nawk).
The previous awk code rewritten for oawk:
bash-3.00# grep -vf a.out b.out > c.out
grep: illegal option -- f
Usage: grep -hblcnsviw pattern file . . .
bash-3.00# awk 'FNR==NR{a[$0]++;next} !($0 in a) ' a.out b.out >c.out
awk: syntax error near line 1
awk: bailing out near line 1
bash-3.00# more a.out
a
b
c
d
bash-3.00# more b.out
a
a
a
b
b
c
d
d
e
f
oh i see...
yup this one works perfect
bash-3.00# awk '(FILENAME=="-") {a[$0]++; next} (a[$0]==0)' - <a.out b.out >c.out
bash-3.00# more c.out
e
f
---------- Post updated at 05:42 PM ---------- Previous update was at 04:51 PM ----------
however this is still much faster with 1.5mil data, result with both codes is still the same, already double checked it.