I have a list of IDs in file1 and a list of sequences in file2. I can print sequences from file2, but I'm asking for help in printing the sequences in the same order as the IDs appear in file1.
file1:
EN_comp12952_c0_seq3:367-1668
ES_comp17168_c1_seq6:1-864
EN_comp13395_c3_seq14:231-1088
ES_comp17836_c2_seq2:2-862
file2:
>EN_comp12952_c0_seq3:367-1668
MDKRLLNVSLLGLAFMFVFTAFQTMGNIEKTILKSIQNDYPSFTGDGYTSL
>EN_comp13395_c3_seq14:231-1088
KMTSPTSSVIRAAVLQLSVSTDKSANIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>EN_comp13226_c0_seq8:928-1788
MIRTAYDEVDKKEEVEKINLDQLSQGDIINLLKNFRDLNTDEQD
>EN_comp12741_c2_seq4:2-406
KHQIKQLTVQLPKEGQPDSGLTKDYTSSPLHRFKKPGSKNYQNIYPPSST
>ES_comp17836_c2_seq2:2-862
RKMTSPTSSVIRAAVLQLSVSTDKSTNIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp14617_c0_seq1:111-608
MSCRYVPEANMTACGTDYSTLAWHSRSYVLVYAMFAYYLPLLVIIYAYYFIV
>ES_comp17031_c0_seq3:3-1238
QLLAGVVKRSLVNATMFSIRNIEKLMQLAPKFIPTSSMLNSSTTSIPVSTPI
>ES_comp17168_c1_seq6:1-864
IFELTVVVSFAGSRLAMFIGACCYTMFLVSFLWPTTFLLYFMSAVIGFGASVIWT
Desired output (same order as file1):
>EN_comp12952_c0_seq3:367-1668
MDKRLLNVSLLGLAFMFVFTAFQTMGNIEKTILKSIQNDYPSFTGDGYTSL
>ES_comp17168_c1_seq6:1-864
IFELTVVVSFAGSRLAMFIGACCYTMFLVSFLWPTTFLLYFMSAVIGFGASVIWT
>EN_comp13395_c3_seq14:231-1088
KMTSPTSSVIRAAVLQLSVSTDKSANIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17836_c2_seq2:2-862
RKMTSPTSSVIRAAVLQLSVSTDKSTNIAIAVKRIQQAKSNGCTLAVLPECFTTPY
The code I am currently using gives me the right sequences, but not in the right order.
awk 'NR==FNR{a[$0]=1;next} {n=0;for(i in a){if($0~i){n=1}}} n {print;getline;print}' file1 file2
Current (wrong order) output:
>EN_comp12952_c0_seq3:367-1668
MDKRLLNVSLLGLAFMFVFTAFQTMGNIEKTILKSIQNDYPSFTGDGYTSL
>EN_comp13395_c3_seq14:231-1088
KMTSPTSSVIRAAVLQLSVSTDKSANIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17836_c2_seq2:2-862
RKMTSPTSSVIRAAVLQLSVSTDKSTNIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17168_c1_seq6:1-864
IFELTVVVSFAGSRLAMFIGACCYTMFLVSFLWPTTFLLYFMSAVIGFGASVIWT
Thanks for any pointers!