xargs egrep

johnbach · July 23, 2009, 5:10am

Below is a sample script

jb>cat search
#!/bin/bash
set -x
PATTERN='"'`awk -F: '{printf $2 "|"} END {printf "\b"}' RULE`'"'
echo 'Pattern ['$PATTERN']'
ls -tr *$1* |xargs egrep "$PATTERN"

Sample RULE file

jb>cat RULE
P1 :pone
P2 :ptwo

Sample Output

jb>./search f
++ awk -F: '{printf $2 "|"} END {printf "\b"}' RULE
+ PATTERN='"pone|ptwo"'
+ echo 'Pattern ["pone|ptwo"]'
Pattern ["pone|ptwo"]
+ ls -tr f1 f1.txt f2 f3 f4
+ xargs egrep '"pone|ptwo"'
f2:ptwo
jb>

Directly searching

jb>ls -tr *f* |xargs egrep "pone|ptwo"
f1:000pone000
f2:ptwo

Sample file content

jb>cat f1
11
000pone000
1111
jb>cat f2
222
ptwo
22222222
jb>

Why 'search' doesn't list f1?

Franklin52 · July 23, 2009, 6:33am

Try the pattern without the double quotes, replace this line:

PATTERN='"'`awk -F: '{printf $2 "|"} END {printf "\b"}' RULE`'"'

with:

PATTERN=`awk -F: '{printf $2 "|"} END {printf "\b"}' RULE`

johnbach · July 23, 2009, 8:19am

Thanks,it works.
I have one more question.

I have some 20 patterns OR'ed 'pat1|pat2|.....pat20'
Each pattern of length around 20 chars
and i search around 30 files each of size 10 MB.
It takes 2 mins to complete !
Is there any optimal/fast way to do this ?

Franklin52 · July 23, 2009, 8:33am

You can try something like:

awk -F: '{print $2}' RULE > tmp.file

grep -f tmp.file $(ls -tr *$1*)

johnbach · July 23, 2009, 9:20am

1.Actually the RULE file names pattern,(RULE file as such is not a pattern) example

>cat RULE
SUCCESS       :Message sent successfully to
FAILURE         :Message send failed for
Acknowledged :Got ack from  
#Goes on

2.This line extract the second column ,forms a (ORed) pattern

PATTERN=`awk -F: '{printf $2 "|"} END {printf "\b"}' RULE`

#PATTERN will be  'Message sent successfully to|Message send failed for|Got ack from'

3.Search for all pattern in all file and redirect the o/p to tmp.txt (Takes 2 mins )

ls -tr *$1* |xargs egrep "$PATTERN" > tmp.txt

4.from tmp.txt do some more filtering,collect statistics (count each pattern)
(Dont have problem with this step ,takes very less time.)

Sample Output :
SUCCESS :1423
FAILURE : 432
Acknowledged : 764

But step 3 alone takes much of the time(around 2 mins)

Franklin52 · July 23, 2009, 9:39am

Have you tried my last 2 commands for steps 2 and 3?

awk -F: '{print $2}' RULE > tmp.file

grep -f tmp.file $(ls -tr *$1*) > tmp.txt

johnbach · July 23, 2009, 10:06am

Once again thanks,It works
Search completed in less than 30 secs.

My grep version doesn't support -f ,fgrep does.

awk -F: '{print $2}' RULE > tmp.txt
fgrep -f tmp.txt $(ls -tr *26_06_2009*) > tmp2.txt

(Sorry I thought you didn't get my question,but actually I misunderstood your solution )