genome
November 15, 2017, 5:02pm
1
Hi Members,
I'm confused with grep -F option. Goal is to get all the lines from file2 that have exact gene name from gene list (file one).
File one has list of genes:
File two has lot more information pertinent to genes in file one:
I use three following commands:
1)
grep -wf gene file2
Not output. This is expected.
2)
grep -Ff gene file2
This gives me both the lines in file2:
However, there's no gene as MEF2B or MEF2BNB-MEF2B in gene list.
But gene MEF2C is present in gene list.
That means -F options collects anything. Am I understanding this correctly?
3)
grep -Fxf gene file2
No output
I guess x looks per line basis
4)
grep -Fwf gene file2
This matches with grep -wf gene file2
I'm confused which is correct way to get appropriate answer. Data is humongous so I'd not know where I made mistake.
Command one and four look good to go.
Would appreciate any help here.
I get no output from both 1 and 2 and never expected any, I see no possible matches.
Scott
November 15, 2017, 5:33pm
3
I didn't see any either. Not sure what to look for here.
You say MEF2C is in the gene list, but it is not present in the file you provided???
What operating system are you using? The standards don't specify the meaning of a grep -w
option. Note also that by some definitions of "word" the line:
19 19257646 19257646 exonic MEF2B,MEF2BNB-MEF2B nonsynonymous SNV
does not contain the word MEF2BNB-MEF2B
, but does the three words MEF2B
(twice), MEF2BNB
(once), and SNV
(once). Are you sure that grep -w
is using the same definition of "word" that you are expecting?
Is there by any chance an empty line in your gene
file?
Are the files you're processing UNIX text files? (Or, might they be DOS text files?)
1 Like
genome
November 15, 2017, 8:28pm
5
Hi corona,
Why do I get the output for 2:
I don't understand why I get these lines.
There's no gene "MEF2B" or "MEF2BNB-MEF2B" in gene list.
The output from command 2 is incorrect, that's what bothering me.
What output do you get from the command:
grep -E 'MEF|SNV' gene
RudiC
November 16, 2017, 4:39am
7
Please answer the questions in Don Cragun's post#4 applying utmost care, esp. checking for the empty line. If that doesn't help, reduce the gene
file to a single line representing a single gene and rotate those until we can find some evidence for the "strange" behaviour.
genome
November 16, 2017, 9:22am
8
This gives me following output:
grep -E 'MEF|SNV' ext_gene_list.txt
I'm on Unix:
x86_64 x86_64 x86_64 GNU/Linux
---------- Post updated at 09:22 AM ---------- Previous update was at 09:05 AM ----------
I'm unable to understand why I get output for:
grep -Ff ext_gene_list.txt no_idea.txt
no_idea.txt is a tab sep file.
There's no gene with name: "exonic" "MEF2B" "MEF2BNB-MEF2B", "MEF2BNB" "nonsynonymous SNV", "SNV"
So if there's no gene, I should not get any output with the above command. Which baffles me with -F file. I'm not sure when to use -F flag.
[/
Blank lines will match any data, artifacts in ext_gene_list.txt will cause erroneous results.
You could try adding -o
for "only print matching part of lines" to see what its matching against.
genome
November 16, 2017, 10:18am
10
corona688:
Blank lines will match any data, artifacts in ext_gene_list.txt will cause erroneous results.
You could try adding -o
for "only print matching part of lines" to see what its matching against.
Thanks this helped.
I have gene "F2" in gene list that matching against no_idea.txt.
grep -Ff f1.txt f2.txt
This was matching "F2" against File2.
1 Like