Using "whitelist" from a file to remove entries

Lord_Spectre · June 23, 2012, 4:10pm

Dear all,
what I need to do is extract an entries list from a file and remove some entries based on a white list present on other file, then output into result.txt file.

Example:

source.txt:
12345 text1 text2 text3 text4
123 text1 text2 text3 text4
678 text1 text2 text3 text4
987 text1 text2 text3 text4
456 text1 text2 text3 text4

whitelist.txt
123
987

output on result.txt file:
12345 text1 text2 text3 text4
678 text1 text2 text3 text4
456 text1 text2 text3 text4

What is the best and fast way to do that?
I can change the CR in whitelist.txt and put a "," like:
123,987
if this can simplify the code...

Many thanks!

agama · June 23, 2012, 4:56pm

Have you tried using grep?

grep -v -f whitelist.txt source.txt >result.txt

Lord_Spectre · June 23, 2012, 5:16pm

Does't work:

[root@localhost ]# grep -v -f whitelist.txt source.txt
678
456

BTW, is slightly more complicated. I've update the first post, since in the source file there's some other datas....

Scrutinizer · June 23, 2012, 5:21pm

Slight modification to agama's suggestion.. Try:

grep -vxf whitelist.txt source.txt

--edit--
OK, I see the original post got changed in the mean time...

Try:

grep -vwf whitelist.txt source.txt

But that could still go wrong if a number is present in the bla bla after field 1. So this would be safer:

awk 'NR==FNR{A[$1]; next}!($1 in A)' whitelist.txt source.txt

--
On Solaris use /usr/xpg4/bin/awk rather than awk

Lord_Spectre · June 23, 2012, 6:05pm

Scrutinizer, you're absolutely the best, it works perfect!!

[root@localhost ]# awk 'NR==FNR{A[$1]; next}!($1 in A)' whitelist.txt source.txt
12345 text1 text2 text3 text4
678 text1 text2 text3 text4
456 text1 text2 text3 text4

Do you think is it possible to make something like this? :

awk 'NR==FNR{A[$1]; next}!($1 in A)' whitelist.txt source.txt | while read line; do
echo $line | awk '{printf $1}'
done

What I need to do is extract values from already filtered values (eg: 1st one) line by line and create another output file like:

blabla 12345 text 
textx 678 some
texty 456 try

OK, I can output into another file the first AWK, and the loop into that file, but is there any other way using directly your AWK command ??

Scrutinizer · June 23, 2012, 6:14pm

You can do this, which will provides only the first fields and take it from there:

awk 'NR==FNR{A[$1]; next}!($1 in A){print $1}' whitelist.txt source.txt

Lord_Spectre · June 23, 2012, 6:40pm

It works like the previous one, obviously with only the first field.
The problem is that I cannot add more text to output, like this:

awk 'NR==FNR{A[$1]; next}!($1 in A) blabla {print $1} text' whitelist.txt source.txt

Expected result:

blabla 12345 text
blabla 678 text
blabla 456 text

Chubler_XL · June 24, 2012, 6:00pm

Add additional text like this:

awk 'NR==FNR{A[$1]; next}!($1 in A) { print "blabla", $1, "text"}' whitelist.txt source.txt

Lord_Spectre · June 25, 2012, 2:42am

Thanks it works perfect, and it was so simple!!! :rolleyes: