Hi all,
I have a list which I want to search in another file.
I can do that using
grep -f
but the search is failing due to special characters, how do I solve this?
One row in that list is
amino-acid permease inda1 [Pyrenophora tritici-repentis Pt-1C-BFP]gb|EDU41782.1| amino-acid permease inda1 [Pyrenophora tritici-repentis Pt-1C-BFP]
Input file to be searched
comp1538736c0_seEON956710 5821833 putative amino-acid permease inda1 protein [Togninia minima UCRPA7] 1e-114 418 0 2 736 97 342 89.8% 80.1%
comp1538234c2_seEON956710 582.455 putative amino-acid permease inda1 protein [Togninia minima UCRPA7] 3e-18 96.7 2 2 229 338 413 71.1% 65.8%
comp1538600c3_seXP_001939063 5733127 amino-acid permease inda1 [Pyrenophora tritici-repentis Pt-1C-BFP]gb|EDU41782.1| amino-acid permease inda1 [Pyrenophora tritici-repentis Pt-1C-BFP] 5e-36 155 2 233 598 448 573 69.8% 59.5%
Yoda
January 30, 2014, 11:57am
2
In bash:
#!/bin/bash
declare -A ARR
while read line
do
ARR["$line"]="$line"
done < file1
while read line
do
for k in "${ARR[@]}"
do
[[ "$line" =~ "$k" ]] && echo "$line"
done
done < file2
Input
$ cat file1
amino-acid permease inda1 [Pyrenophora tritici-repentis Pt-1C-BFP]gb|EDU41782.1| amino-acid permease inda1 [Pyrenophora tritici-repentis Pt-1C-BFP]
$ cat file2
comp1538736c0_seEON956710 5821833 putative amino-acid permease inda1 protein [Togninia minima UCRPA7] 1e-114 418 0 2 736 97 342 89.8% 80.1%
comp1538234c2_seEON956710 582.455 putative amino-acid permease inda1 protein [Togninia minima UCRPA7] 3e-18 96.7 2 2 229 338 413 71.1% 65.8%
comp1538600c3_seXP_001939063 5733127 amino-acid permease inda1 [Pyrenophora tritici-repentis Pt-1C-BFP]gb|EDU41782.1| amino-acid permease inda1 [Pyrenophora tritici-repentis Pt-1C-BFP] 5e-36 155 2 233 598 448 573 69.8% 59.5%
Output
$ ./look.bash
comp1538600c3_seXP_001939063 5733127 amino-acid permease inda1 [Pyrenophora tritici-repentis Pt-1C-BFP]gb|EDU41782.1| amino-acid permease inda1 [Pyrenophora tritici-repentis Pt-1C-BFP] 5e-36 155 2 233 598 448 573 69.8% 59.5%
-F -f tells grep to consider them fixed strings, instead of regular expressions.