Using "whitelist" from a file to remove entries

Dear all,
what I need to do is extract an entries list from a file and remove some entries based on a white list present on other file, then output into result.txt file.

Example:

source.txt:
12345 text1 text2 text3 text4
123 text1 text2 text3 text4
678 text1 text2 text3 text4
987 text1 text2 text3 text4
456 text1 text2 text3 text4
whitelist.txt
123
987
output on result.txt file:
12345 text1 text2 text3 text4
678 text1 text2 text3 text4
456 text1 text2 text3 text4

What is the best and fast way to do that?
I can change the CR in whitelist.txt and put a "," like:
123,987
if this can simplify the code...

Many thanks!

Have you tried using grep?

grep -v -f whitelist.txt source.txt >result.txt

Does't work:

[root@localhost ]# grep -v -f whitelist.txt source.txt
678
456

BTW, is slightly more complicated. I've update the first post, since in the source file there's some other datas....

Slight modification to agama's suggestion.. Try:

grep -vxf whitelist.txt source.txt

--edit--
OK, I see the original post got changed in the mean time...

Try:

grep -vwf whitelist.txt source.txt

But that could still go wrong if a number is present in the bla bla after field 1. So this would be safer:

awk 'NR==FNR{A[$1]; next}!($1 in A)' whitelist.txt source.txt

--
On Solaris use /usr/xpg4/bin/awk rather than awk

2 Likes

Scrutinizer, you're absolutely the best, it works perfect!! :b:

[root@localhost ]# awk 'NR==FNR{A[$1]; next}!($1 in A)' whitelist.txt source.txt
12345 text1 text2 text3 text4
678 text1 text2 text3 text4
456 text1 text2 text3 text4

Do you think is it possible to make something like this? :

awk 'NR==FNR{A[$1]; next}!($1 in A)' whitelist.txt source.txt | while read line; do
echo $line | awk '{printf $1}'
done

What I need to do is extract values from already filtered values (eg: 1st one) line by line and create another output file like:

blabla 12345 text 
textx 678 some
texty 456 try

OK, I can output into another file the first AWK, and the loop into that file, but is there any other way using directly your AWK command ??

You can do this, which will provides only the first fields and take it from there:

awk 'NR==FNR{A[$1]; next}!($1 in A){print $1}' whitelist.txt source.txt
1 Like

It works like the previous one, obviously with only the first field.
The problem is that I cannot add more text to output, like this:

awk 'NR==FNR{A[$1]; next}!($1 in A) blabla {print $1} text' whitelist.txt source.txt

Expected result:

blabla 12345 text
blabla 678 text
blabla 456 text

Add additional text like this:

awk 'NR==FNR{A[$1]; next}!($1 in A) { print "blabla", $1, "text"}' whitelist.txt source.txt
1 Like

Thanks it works perfect, and it was so simple!!! :rolleyes: