Grep mutiple patterns with 'AND' operator

sjolicoeur · December 1, 2011, 3:08am

Hello, I'm trying for days to do a grep without any success. :wall:

I have two patterns being:
1 - Master en Achats
2 - complet

$ find /var/www/mysite/uploads/files/*.doc -exec egrep -l --ignore-case "Master en Achats|complet" {} \;

The problem is that the grep command is done with an 'OR' operator listing files having pattern1 or pattern2.:wall:

I want the grep command to search files having both pattern1 'AND' pattern2 within it.

How should I write the command? :wall:

clx · December 1, 2011, 3:12am

There is no logical AND in grep.

use:

.... grep "pattern 1" | grep "pattern 2"

sjolicoeur · December 1, 2011, 3:21am

Hi thanks for your reply, I'm new to linux and I don't exactly know how to write the syntax.

I've tried:

$ find /var/www/hrmtest/cache/upload/*.doc -exec egrep -l --ignore-case "Master en Achats" {} | egrep -l --ignore-case "complet" {} \;

But it's not working and it's giving me an error.

itkamaraj · December 1, 2011, 3:31am

 
$ find /var/www/hrmtest/cache/upload/*.doc -exec egrep -l --ignore-case "Master en Achats" {} | egrep -l --ignore-case "complet"

sjolicoeur · December 1, 2011, 3:39am

Hello, I've just tried the above command:

$ find /var/www/hrmtest/cache/upload/*.doc -exec egrep -l --ignore-case "Master en Achats" {} | egrep -l --ignore-case "complet"

It's not working, it's saying:

find: Paramètre manquant pour « -exec »

:wall:

itkamaraj · December 1, 2011, 3:43am

just noticied, that you are trying to grep the microsoft word document.

is it unicode ?

http://www.unix.com/shell-programming-scripting/88007-grep-ms-word-document.html

sjolicoeur · December 1, 2011, 3:55am

This command is working.

$ find /var/www/mysite/uploads/files/*.doc -exec egrep -l --ignore-case "Master en Achats|complet" {} \;

I'm able to grep the ms word documents having mutiple patterns but I can only do an 'OR' search.

But I want to be able to do a grep with an 'AND' instead of the 'OR' operator to list only those files containing both patterns.

But i don't know how to write the syntax for an 'AND' multiple pattern search.

Please help!

clx · December 1, 2011, 3:58am

Try

find /var/www/hrmtest/cache/upload/*.doc -exec sh -c 'egrep -l --ignore-case "Master en Achats" $1 | xargs egrep -l --ignore-case "complet"' {} {} \;

or

find /var/www/hrmtest/cache/upload/*.doc | xargs egrep -l --ignore-case "Master en Achats" | xargs egrep -l --ignore-case "complet"

sjolicoeur · December 1, 2011, 4:05am

Thanks anchal_khare

I won't be fired.

itkamaraj · December 1, 2011, 4:22am

if you always expect the Master en Achats comes first and complet comes second

then you can try

Master en Achats.*complet

sdebasis · December 1, 2011, 6:13am

Hi,
sjolicoeur

please try this code it will work

$ find /var/www/mysite/uploads/files/*.doc -exec egrep -liw "Master en Achats|complet" {} \;

Regards
Debasis

methyl · December 1, 2011, 7:12am

Similar idea to anchal_khare . Search each file twice if necessary.

find /var/www/hrmtest/cache/upload/ -type f -name '*.doc' | while read filename
do
     egrep -l --ignore-case "Master en Achats" "${filename}"; FOUND1=$?
     # Leave the loop if we don't find the first string
     if [ "${FOUND1}" -ne 0 ]
     then
                 continue
     fi
     #
     egrep -l --ignore-case "complet" "${filename}"; FOUND2=$?
     # Leave the loop if we don't find the second string	
     if [ $FOUND2 -ne 0 ]
     then
            continue
     fi
     # Report matching file matching both conditions
     echo "${filename}"
done

If these files are Microsoft Word format document the process is not reliable. The "egrep" program is only suitable for searching unix format ascii text files where each record are of a reasonable length and terminated with a line-feed character.
A Microsoft Word document is not like this. Even a "space" character can be a non-ascii character.

felipe.vinturin · December 1, 2011, 7:26am

How about this:

$ find /var/www/mysite/uploads/files/*.doc \ |
while read fname
do
     egrep -l --ignore-case "Master en Achats" "${fname}" | egrep -l --ignore-case "complet"
done

EDIT: Forget it! This one will not work!

methyl · December 1, 2011, 8:09am

@felipe.vinturin
I'm afraid that your script contains the same mistake as several earlier attempts in this thread.
A "grep -l" outputs a filename. The second "grep -l" in your script will look for a filename containing the string "complet".

felipe.vinturin · December 1, 2011, 8:22am

@methyl
That't why I wrote it was not going to work! Also, your solution does what the OP wants! =)

balajesuri · December 1, 2011, 8:41am

Input:

# cat 1.dat
hello
world
this
is
a
file
# cat 2.dat
this
is
hello
world
file
# cat 3.dat
hello
this
file
doesnt
have
those
two
words

The below program will print the file names only if they have both words "hello" and "world".

#! /usr/bin/perl -w
use strict;
my ($flag1, $flag2, @x);
@x = <*.dat>;
for (@x) {
    $flag1 = 0;
    $flag2 = 0;
    open I, "< $_";
    for (<I>) {
        $flag1 = 1 if (/hello/);
        $flag2 = 1 if (/world/);
    }
    print "$_ has hello and world\n" if ($flag1 == 1 && $flag2 == 1);
    close I;
}

Output:

# ./hello_world.pl
1.dat has hello and world
2.dat has hello and world