Find common terms in two text file, xargs, grep

eon · October 11, 2011, 4:33am

Hello,

I'm interested in finding all occurrences of the terms in file1 in file2, which are both csv files. I can do this with a loop but I'm interested in knowing if I can also do it with the help of xargs and grep. What I have tried:

cat file1 | xargs grep file2

The problem is that grep tries to open the terms in file1 instead of using them as patterns.

ctsgnb · October 11, 2011, 4:45am

grep -F file1 file2

fgrep file1 file2

eon · October 11, 2011, 5:19am

Thank you, that is so much better! I managed to get grep -f to do what I wanted, was not aware of this.

clx · October 11, 2011, 5:31am

If you want to search for the patterns from a file ( file1 in your example ) , you must use the "-f" option.

"grep -F" is equivalent to "fgrep" ( treating patterns as fix string and not regex) but differ from "grep -f" (patterns from file ) . you probably want to combine the both.

grep -F -f file1 file2

or

fgrep -f file1 file2

eon · October 12, 2011, 1:27am

Thank you, I used the "-f" option, adding "-F" as well made it significantly faster.

---------- Post updated 10-12-11 at 12:27 AM ---------- Previous update was 10-11-11 at 04:43 AM ----------

I have another question related to this use of grep. If I instead wan to list all items in file1 that does not exist in file2 by negating how can that be done?

I tried the -v flag but it did not do this, am I missing something here or is it not possible to do this with grep alone? I have used -v before with grep but never combined with -f that is new to me.

grep -v -F -f file1 file2

EDIT:
I managed to get this working with the following script, any comments or tips if there are easier ways to accomplish the same this would be appreciated.

#!/bin/bash

if [ "$#" -ne 2 ]
then
    echo "Missing arguments"
    exit 1
fi

while read line ; do RESULT=$(grep "$line" "$1") ;

if [ -z "$RESULT" ]
then
     echo "$line"
fi ; done < "$2"

clx · October 12, 2011, 2:25am

eon:

If I instead wan to list all items in file1 that does not exist in file2 by negating how can that be done?

I tried the -v flag but it did not do this, am I missing something here or is it not possible to do this with grep alone? I have used -v before with grep but never combined with -f that is new to me.
grep -v -F -f file1 file2

grep -vFf file2 file1

eon · October 12, 2011, 3:00am

That still does not work I'm afraid.

file1 contains 99 lines which all appear in file2.

file2 contains 109 lines.

What I want is to get the 10 lines that does not appear in file1.

Doing this:

grep -vFf file1 file2 | wc -l

gives me: 109

but switching the order gives me: 0

ahamed101 · October 12, 2011, 3:12am

Can you paste file1 and file2? (need not be the entire file)

--ahamed

eon · October 12, 2011, 3:13am

I'd rather not, they are emails.

Edit: I made two mock up files with just random words in them, and it now work as intended with them. Very strange, are there any special meaning to @ for bash that could mess this up?

There is nothing more than one email address per line at this stage.

clx · October 12, 2011, 3:20am

eon:

That still does not work I'm afraid.

file1 contains 99 lines which all appear in file2.

file2 contains 109 lines.

What I want is to get the 10 lines that does not appear in file1.

Doing this:
grep -vFf file1 file2 | wc -l
gives me: 109

but switching the order gives me: 0

$ wc -l fff1
99 fff1
$ wc -l fff2
109 fff2
$ fgrep -vxf fff1 fff2
100
101
102
103
104
105
106
107
108
109
$

eon · October 12, 2011, 3:23am

@anchal_khare I also got it working with other files, but not with emails.

ahamed101 · October 12, 2011, 3:24am

Even with email id's also, it is working for me!
Something fishy in your emails then

check the output of those mails with cat -E fishy_email if it has any special characters! or take a octal dump
od -bc fishy_email

You didn't edit those files in windows, did you?

--ahamed

clx · October 12, 2011, 3:24am

If the domain name are same in one or more email addresses, grep would match that line always. In that case you must use "-x" flag.

           -x                  (eXact) Matches are recognized only when the
                               entire input line matches the fixed string or
                               regular expression.

eon · October 12, 2011, 3:32am

Thanks, I tried the octal dump and it appears that the line endings are: \r \n

Could that be the problem? I could try to strip of '\r' with sed I suppose.

ahamed101 · October 12, 2011, 3:35am

Yes, that should be the problem. Try stripping that off.

--ahamed

clx · October 12, 2011, 3:41am

You can also use dos2ux ( on HP-UX ) or equivalent command whichever your OS supports. on Linux, it should be dos2unix