Hi
I have to list of words file1 and file2, I want to compare both lists and remove from file2 all the words that don't exist in file1.
How can I do this?
Many thanks
Hi
I have to list of words file1 and file2, I want to compare both lists and remove from file2 all the words that don't exist in file1.
How can I do this?
Many thanks
awk 'FILENAME=="file1" { arr[$0]++ }
FILENAME=="file2" { if( $0 in arr ) {print $0}; next } ' file1 file2 > tmp.tmp
# be SURE you got what you wanted before doing the mv command
mv tmp.tmp file2
# cat f1
a f g h i
j k l
# cat f2
o p q r
g z x
n b i
# comm -12 <(xargs -n1 <f1 | sort) <(xargs -n1 <f2 | sort)
g
i
#
... but ok this solution may not be the most optimized one ...
You're correct about it not being optimal ;). xargs will fork/exec echo once per word in each file. Not a big deal for smaller files, but it would be an expensive solution if the dataset were large.
Regards,
Alister
Ok ok
... a little better with tr :
# time comm -12 <(xargs -n1 <f1 | sort) <(xargs -n1 <f2 | sort)
g
i
real 0m0.022s
user 0m0.000s
sys 0m0.050s
# time comm -12 <(tr ' ' '\n' <f1 | sort) <(tr ' ' '\n' <f2 | sort)
g
i
real 0m0.009s
user 0m0.000s
sys 0m0.010s
If we can assume the lists already consist of a single column (just as Jim's code does) the tr
step can then be removed.
And if the lists are already sorted, we can then also remove the sorting step ...