Help with script - GREP

mirkocosta · August 3, 2015, 5:25am

Hallo gentlemen,

i've a problem removing lines from txt file. To make it simple, here is an example:

TEXT1.TXT --- contents:

9.9.9.9 geek.net
1.1.1.1 geek.com
2.2.2.2 leet.net

TEXT2.TXT --- contents:

geek.com
coolbar.org

I simply do:

cat text1.txt | grep -f text2.txt > final.txt

but it returns a 0 bytes file. In the sample above all i want to do is remove all the lines that contains the strings defined in text2.txt

Since yesterday I used fgrep succesfully (which, AFAIK is the same as grep -f). Any thoughts/help/suggestion?

Thx in advance

sea · August 3, 2015, 5:27am

Try:

grep -f text2.txt text1.txt > final.txt

And for next times, please use code tags as you agreed by the forum rules.

hth

mirkocosta · August 3, 2015, 5:34am

Fixed (the post). Thanks for the help, I'll try.
Edit: not working. The lines aren't deleted... what's wrong?

sea · August 3, 2015, 6:09am

final.txt contains the entries provided by text2.txt and were found in text1.txt , which is geek.com only.

The looks for me:

0 ~/tmp$ cat text1.txt 
9.9.9.9 geek.net
1.1.1.1 geek.com
2.2.2.2 leet.net

0 ~/tmp $ cat text2.txt 
geek.com
coolbar.org

0 ~/tmp $ grep -f text2.txt text1.txt > final.txt

0 ~/tmp $ cat final.txt 
1.1.1.1 geek.com

Or did you talk about something different?

mirkocosta · August 3, 2015, 6:30am

No, the fact is that strangely I achieved what I want by adding -v.
Do not ask me why, but with -v (excluding) I got what I want. Thanks for the support btw.

Scrutinizer · August 3, 2015, 7:00am

Note: Even though this will work in most of the cases, there is the potential for false matches, since the . (dot) is interpreted as any character in grep's regular expressions..
This could be improved somewhat using string matches with the -F operator:

grep -vFf text2.txt text1.txt

But then there could still be partial matches and one would have to do something like this, using bash/ksh93 process substitution:

grep -vFf <(sed 's/^/ /' text2.txt) text1.txt

Although even that would still not be 100% sure.
To achieve that we would need to use something like this:

grep -vf <(sed 's/^/[[:blank:]]/; s/\./\\./g; s/$/$/' text2.txt) text1.txt

--
A better option would be to use awk, using exact string matching of fields:

awk 'NR==FNR{A[$1]; next} !($2 in A)' text2.txt text1.txt

mirkocosta · August 3, 2015, 7:14am

Thanks for replying.
Just for curiosity, which of those method is the fastest?

sea · August 3, 2015, 7:47am

You can figure out by adding time in front of them.

Like:

time grep -vf "text2.txt" "text1.txt" > "final.txt"

Likewise for Scrutinizers command.

hth