Grep two files: -F flag gives weird output

Hi Members,

I'm confused with grep -F option. Goal is to get all the lines from file2 that have exact gene name from gene list (file one).

File one has list of genes:

File two has lot more information pertinent to genes in file one:

I use three following commands:

1)

grep -wf gene file2

Not output. This is expected.

2)

 grep -Ff gene file2

This gives me both the lines in file2:

However, there's no gene as MEF2B or MEF2BNB-MEF2B in gene list.

But gene MEF2C is present in gene list.

That means -F options collects anything. Am I understanding this correctly?

3)

 grep -Fxf gene file2

No output
I guess x looks per line basis

4)

grep -Fwf gene file2

This matches with grep -wf gene file2

I'm confused which is correct way to get appropriate answer. Data is humongous so I'd not know where I made mistake.

Command one and four look good to go.

Would appreciate any help here.

I get no output from both 1 and 2 and never expected any, I see no possible matches.

I didn't see any either. Not sure what to look for here.

You say MEF2C is in the gene list, but it is not present in the file you provided???

What operating system are you using? The standards don't specify the meaning of a grep -w option. Note also that by some definitions of "word" the line:

19 19257646 19257646 exonic MEF2B,MEF2BNB-MEF2B nonsynonymous SNV

does not contain the word MEF2BNB-MEF2B , but does the three words MEF2B (twice), MEF2BNB (once), and SNV (once). Are you sure that grep -w is using the same definition of "word" that you are expecting?

Is there by any chance an empty line in your gene file?

Are the files you're processing UNIX text files? (Or, might they be DOS text files?)

1 Like

Hi corona,

Why do I get the output for 2:

I don't understand why I get these lines.
There's no gene "MEF2B" or "MEF2BNB-MEF2B" in gene list.

The output from command 2 is incorrect, that's what bothering me.

What output do you get from the command:

grep -E 'MEF|SNV' gene

Please answer the questions in Don Cragun's post#4 applying utmost care, esp. checking for the empty line. If that doesn't help, reduce the gene file to a single line representing a single gene and rotate those until we can find some evidence for the "strange" behaviour.

This gives me following output:

grep -E 'MEF|SNV' ext_gene_list.txt

I'm on Unix:

x86_64 x86_64 x86_64 GNU/Linux

---------- Post updated at 09:22 AM ---------- Previous update was at 09:05 AM ----------

I'm unable to understand why I get output for:

grep -Ff ext_gene_list.txt no_idea.txt

no_idea.txt is a tab sep file.

There's no gene with name: "exonic" "MEF2B" "MEF2BNB-MEF2B", "MEF2BNB" "nonsynonymous SNV", "SNV"

So if there's no gene, I should not get any output with the above command. Which baffles me with -F file. I'm not sure when to use -F flag.

[/

Blank lines will match any data, artifacts in ext_gene_list.txt will cause erroneous results.

You could try adding -o for "only print matching part of lines" to see what its matching against.

Thanks this helped.
I have gene "F2" in gene list that matching against no_idea.txt.

grep -Ff f1.txt f2.txt

This was matching "F2" against File2.

1 Like