How to print the "grep" result as specified keyword order?

I have a content.xls file as given below,

NC_020815.1	1891831	1894692	virb4_A0A0H2X8Z4_	1	954	1945
NC_020815.1	1883937	1886123	vird4_A0A0P9KA26_	1	729	1379
NC_020815.1	2976151	2974985	virb10_H8FLU5_Ba	1	393	478
NC_020815.1	2968797	2967745	virb6_A0A0Q5GCZ4	5	398	499
NC_020815.1	2974985	2973930	virb11_A0A220WG23	1	352	667
NC_020815.1	2977958	2976915	virb8_A0A220WG48	1	333	462
NC_020815.1	2976915	2976151	virb7_A0A2H1SLS0	1	255	464
NC_020815.1	2976915	2976151	virb9_A0A2H1SLS0	1	255	464
NC_020815.1	2969422	2968895	virb5_V7ZBE7		177	206	465

I have the following key words file as id.txt ,

virb4 
vird4 
virb10 
virb9 
virb8 
virb11 
virb6
virb7 

I need to print the key word containing entire row as the order mentioned in id.txt file (keyword file). The expected outcome as follows,

NC_020815.1	1891831	1894692	virb4_A0A0H2X8Z4_	1	954	1945
NC_020815.1	1883937	1886123	vird4_A0A0P9KA26_	1	729	1379
NC_020815.1	2976151	2974985	virb10_H8FLU5_Ba	1	393	478
NC_020815.1	2976915	2976151	virb9_A0A2H1SLS0	1	255	464
NC_020815.1	2977958	2976915	virb8_A0A220WG48	1	333	462
NC_020815.1	2974985	2973930	virb11_A0A220WG23	1	352	667
NC_020815.1	2968797	2967745	virb6_A0A0Q5GCZ4	5	398	499
NC_020815.1	2976915	2976151	virb7_A0A2H1SLS0	1	255	464

I have tried the following command to do the same,

grep 'virb4\|vird4\|virb10\|virb9\|virb8\|virb11\|virb6\|virb7' content.xls > print.xls

It print the keywords containing entire row as the order of content.xls, not the order mentioned in the keyword. Moreover, I need to specify the keywords in a separate file as id.txt not like the code tried by me.
Therefore, please help me to do the same.

For reading the keywords from file, try grep -f id.txt

For keeping the sequence as given in id.txt , you could write e.g. an awk or perl script. Or try this very inefficient shell (untested):

while read ID
  do   grep $ID content.xls
  done < id.txt

Don't try with large files, as it scans through the entire content.xls for every single keyword in id.txt
Reading the content.xls file into an associated array indexed by the key value and then printing it according to the keys found in id.txt is left as an exercise for the reader.

1 Like
$ awk ' FNR == NR { ky=$4; sub("_.*","",ky); arr[ky]=$0; next } {print arr[$1]} ' content.xls id.txt
NC_020815.1     1891831 1894692 virb4_A0A0H2X8Z4_       1       954     1945
NC_020815.1     1883937 1886123 vird4_A0A0P9KA26_       1       729     1379
NC_020815.1     2976151 2974985 virb10_H8FLU5_Ba        1       393     478
NC_020815.1     2976915 2976151 virb9_A0A2H1SLS0        1       255     464
NC_020815.1     2977958 2976915 virb8_A0A220WG48        1       333     462
NC_020815.1     2974985 2973930 virb11_A0A220WG23       1       352     667
NC_020815.1     2968797 2967745 virb6_A0A0Q5GCZ4        5       398     499
NC_020815.1     2976915 2976151 virb7_A0A2H1SLS0        1       255     464
1 Like