Script to remove same content from other file

Hi/ Hello all Guru here,

I am trying to create script to remove same content from other file, already tested few idea and found that in unix it is limited to sort and uniq. There is many script for removing duplicate content however to delete all same content is non. Need your help and guide . thx

so here the situation

  1. a.out a file with content that we need to delete in other file
more a.out
a
b
c
d
  1. b.out a file that
more b.out
a
a
a
b
b
c
d
d
e
f
  1. the result should be like c.out file
more c.out
e
f

note: the data is around 1.5 million list.

---------- Post updated at 03:55 PM ---------- Previous update was at 03:53 PM ----------

tested and not working and most only to delete duplication and im lost

sed -e "s/Text_1/TextA/" -e "s/Text1/TextB/" <your_file.txt>your_file_new.txt

cat deleteme.txt deleteme.txt masterlist.txt | sort | uniq -u > newmasterlist.txt

diff file-a file-b --new-line-format="" --old-line-format="%L" --unchanged-line-format="" > file-a

Try

grep -vf a.out b.out > c.out

Using awk

awk 'FNR==NR{a[$0]++;next} !($0 in a) ' a.out b.out >c.out
cat c.out
e
f

As I understand the requirement it should be

fgrep -vxf a.out b.out > c.out

No RE match, and full line match.
The difference is visible if b.out has a line

fa
1 Like

this is not working, Im using solaris 10 64bit ( unix ) btw

bash-3.00# grep -vf a.out b.out > c.out
grep: illegal option -- f
Usage: grep -hblcnsviw pattern file . . .
bash-3.00# awk 'FNR==NR{a[$0]++;next} !($0 in a) ' a.out b.out >c.out
awk: syntax error near line 1
awk: bailing out near line 1

this is perfectly working!!! thx a million mate. also as checked all the similar entries was removed in c.out.

bash-3.00# fgrep -vxf a.out b.out > c.out
bash-3.00# more c.out|wc -l
 1481733

Do you run your test with file you posted in post #1, or real data.
If it does not work on real data, only test data, do post real data.

In Solaris 10 you must use the Posix variants /usr/xpg4/bin/grep and /usr/xpg4/bin/awk. The ones in /usr/bin/ are from old Unix SysV 4.0.
Especially /usr/bin/awk is a link to /usr/bin/oawk (already AT&T said it's old and provided /usr/bin/nawk).
The previous awk code rewritten for oawk:

awk '(FILENAME=="-") {a[$0]++; next} (a[$0]==0)' - <a.out b.out >c.out
1 Like

tested with a similar test data.

bash-3.00# grep -vf a.out b.out > c.out
grep: illegal option -- f
Usage: grep -hblcnsviw pattern file . . .
bash-3.00# awk 'FNR==NR{a[$0]++;next} !($0 in a) ' a.out b.out >c.out
awk: syntax error near line 1
awk: bailing out near line 1
bash-3.00# more a.out
a
b
c
d
bash-3.00# more b.out
a
a
a
b
b
c
d
d
e
f

oh i see...

yup this one works perfect

bash-3.00# awk '(FILENAME=="-") {a[$0]++; next} (a[$0]==0)' - <a.out b.out >c.out
bash-3.00# more c.out
e
f

---------- Post updated at 05:42 PM ---------- Previous update was at 04:51 PM ----------

however this is still much faster with 1.5mil data, result with both codes is still the same, already double checked it.

fgrep -vxf a.out b.out > c.out