Tips/advise on alternative to doing egrep -v

Hi all,

At the moment, I am doing the following to exclude some exception strings. The more I need to exclude, the longer the string becomes and it has become error prone as I edit the list manually.

$ cat output.txt
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.101 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.105 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.98 user=mickey
host=192.168.1.111 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.102 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.104 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.9 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.103 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.107 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.123 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.108 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.109 user=mickey
host=192.168.1.99 user=mickey

$ egrep "192.168.1.101|192.168.1.102|192.168.1.103|192.168.1.123" output.txt
host=192.168.1.101 user=mickey
host=192.168.1.102 user=mickey
host=192.168.1.103 user=mickey
host=192.168.1.123 user=mickey

$ egrep -v "192.168.1.101|192.168.1.102|192.168.1.103|192.168.1.123" output.txt
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.105 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.98 user=mickey
host=192.168.1.111 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.104 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.9 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.107 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.108 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.99 user=mickey
host=192.168.1.109 user=mickey
host=192.168.1.99 user=mickey

I presume I can do something like assign several variables and then concatenate them together and will be trying this out later, so something like below:

$ 

set1="192.168.1.101|192.168.1.102"
set2="192.168.1.111|192.168.1.112"
all_set="${set1}|${set2}"
egrep -v "${all_set}" output.txt

I am sure someone can suggest a better and more efficient way of doing this. I am hoping to be able to use an exception file and use that as an exclusion list when parsing output.txt but can't find an example of how to do it like that. So if I need to exclude more search string, then I just edit that exception file. And that exception file can contain other things to exclude too which is a more efficient way of doing a search <file> but exclude <strings>.

Please advise of tips and examples that I can try.

If your exclude.txt contains content like :

192.168.1.101
192.168.1.102
192.168.1.103
192.168.1.123

And you wish to exclude from your output.txt based on IP address in exclude.txt , this could be a start.

awk 'NR==FNR { a[$0] } { wo=$0; gsub("[a-z,=]","",$1); if ( !( $1 in a ) ) print wo } ' exclude.txt input.txt

What other things you wish to exclude except IP address ?
Above is just a simple example.

Hope that helps
Regards
Peasant.

I think your current concern is about overlong lines.
The | divider is an egrep thing.
In grep and fgrep you can have a newline.

fgrep -v "192.168.1.101
192.168.1.102
192.168.1.103
192.168.1.123" output.txt

But you should also be concerned about exactness.
The fgrep takes a dot as is, while in grep and egrep a dot means "any character". So fgrep is more exact here.
Still each search item can be a part of the whole, for example
fgrep "10.168.1.13" can find "10.168.1.13" and "110.168.1.13" and "10.168.1.136".

You should be aware of several things that can catch you:-

  • Using egrep is the same as grep -E so the string passed is an Extended Regular Expression. Along with | as an 'or' separator, it also means that the . is a wildcard for a single character. Searching for 192.168 will also match 192g168
  • You can group expressions or characters using [expression] so you can consolidate your search/exclude.

I'm not clear what the overall requirement for this is, but I think you are looking for sessions for user mickey that are/aren't from a specific set up IP addresses. Might I suggest:

egrep -v ^host=192\.168\.1\.10[123]|192\.168\.1\.123

If this is all there is, then it may be to better blend these together like 192\.168\.(10[123]|123)

I hope that this helps,
Robin

Hi

grep -vFf exlude.txt original.txt

option -F allows you to not escape special characters such as dot in your exclude file