Grep command

  grep -i -f panel_genes.txt hg19_refGene.txt > match.txt 

seems to be pulling names the do not exist in the input file (panel_genes.txt) - the output is attached as well (match.txt)

For example, RNF185 or ZNF146 are not genes in the input. I am trying to match the input file genes only and am not sure why it is not. Thank you.

match.txt is the output.

grep may not be the right tool here... The string "RNF185" contains "NF1", for instance, so grep finds them both.

Are you looking for these strings in any particular column?

So whats the content of hg19_refGene.txt - the file within grep is looking for a string "panel_genes.txt".
Also, you're uploading a 'result file' (match.txt) that is alot larger than what you call an input file (panel_genes.txt).

Also, even if i misunderstand the grep command, the result would overwrite the 'more-detailed-looking' match.txt file with a 'row of single strings' compared to its current content.

Phew, translating is hard at times.

Basically what I am trying to do is match the exact gene name in the input (panel_genes.txt) with the gene names in column M of the attached hg19_refGene.txt. Thank you.

By "column M" I'm guessing you mean "column 13".

awk 'NR==FNR { X[toupper($1)]++ ; next } ; toupper($13) in X' list_file hg19_refGene.txt

Use nawk on Solaris.

Produces 247 results for me, none of which are RNF185.

1 Like

@ Corona688 X[toupper($1)] is just enough I think..

I am not to familiar with awk and tried the command but got:

awk 'NR==FNR { X[toupper($1)]++ ; next } ; toupper($13) in X' keys hg19_refGene.txt
awk: fatal: cannot open file `keys' for reading (No such file or directory). Thanks.

It means there is no file named keys please do check your input.

Thank you.