Grep a file pattern in another

flyfisherman · October 8, 2013, 12:57pm

Hi

I'm new to the forum, so I'd apologize for any error in the format of the post.

I'm trying to find a file content in another one using:

grep -w -f file1 file2

file1

GJA7
TSC

file 2

GJC1	GJA7
TSC1	TSC
TSC22D3	TSC-22R

so I expect the result to be:

GJC1	GJA7
TSC1	TSC

but instead it's:

GJC1	GJA7
TSC1	TSC
TSC22D3	TSC-22R

actually the problem is that it sees TSC-22R like two words, not one, and so matches the first part it with TSC.

Thanks

Akshay_Hegde · October 8, 2013, 1:57pm

Try

$ cat a
GJA7
TSC

$ cat b
GJC1    GJA7
TSC1    TSC
TSC22D3    TSC-22R

$ awk 'NR == FNR {_[$0]++; next} !($0 in _) && length($1)<5' a b
GJC1    GJA7
TSC1    TSC

Scrutinizer · October 8, 2013, 2:15pm

Try:

awk 'NR==FNR{A[$1]; next} {for(i=1; i<=NF; i++) if ($i in A) {print; next}}' file1 file2

Akshay_Hegde · October 8, 2013, 2:20pm

you missed 2nd file I think while posting:)

I think second if ($i in A) {print; next}} is not necessary

Scrutinizer do you have solution for this thread using grep I am interested

Scrutinizer · October 8, 2013, 3:05pm

That is right, thanks I corrected it in my post, thanks. The {print; next} is there, otherwise a line would get printed twice if there is a double match, for example..

Akshay_Hegde · October 8, 2013, 3:09pm

I was not aware about it...Thank you..so much Scrutinizer

Scrutinizer · October 8, 2013, 3:13pm

A grep solution might be (if your grep understands "-"):

sed 's/^/(^|[[:space:]])/; s/$/([[:space:]]|$)/' file1 | grep -Ef - file2

or (in bash / ksh93 )

grep -Ef <(sed 's/^/(^|[[:space:]])/; s/$/([[:space:]]|$)/' file1 ) file2

Akshay_Hegde · October 8, 2013, 3:23pm

It worked fine...Thanks a lot Scrutinizer

disedorgue · October 8, 2013, 4:26pm

Hi,
Another way with grep and xargs (as sed Scrutinizer solution):
1) xargs posix:

grep -Ef <(cat file1 | xargs -I {} printf "(^|[[:space:]]){}([[:space:]]|$)\n") file2

2) xargs gnu:

grep -Ef <(xargs -a file1 -I {} printf "(^|[[:space:]]){}([[:space:]]|$)\n") file2

Regards.

alister · October 8, 2013, 8:25pm

Every grep suggestion thus far in this thread (including the OP's) assumes that it is safe to evaluate file1 in a regular expression context. If file1 does not contain regular expressions (very likely), and if its literal text may contain regular expression metacharacters (possible), that assumption would be unjustified and could yield erroneous output.

The assumption may be valid (probably is), but given the miniscule data sample and lack of specifics in the OP, I thought it prudent to mention it.

Regards,
Alister

Scrutinizer · October 9, 2013, 12:31am

I agree, and with these grep suggestions there is then also sensitivity to spacing in file1 (one leading or trailing space somewhere in the file will influence the result, which may not be desirable) so generally I would prefer the awk approach.

flyfisherman · October 9, 2013, 11:33am

Thank you guys! You're just fantastic! It resolved my problem.
Best