Print sequences from file2 based on match to, AND in same order as, file1

I have a list of IDs in file1 and a list of sequences in file2. I can print sequences from file2, but I'm asking for help in printing the sequences in the same order as the IDs appear in file1.

file1:

EN_comp12952_c0_seq3:367-1668
ES_comp17168_c1_seq6:1-864
EN_comp13395_c3_seq14:231-1088
ES_comp17836_c2_seq2:2-862

file2:

>EN_comp12952_c0_seq3:367-1668
MDKRLLNVSLLGLAFMFVFTAFQTMGNIEKTILKSIQNDYPSFTGDGYTSL
>EN_comp13395_c3_seq14:231-1088
KMTSPTSSVIRAAVLQLSVSTDKSANIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>EN_comp13226_c0_seq8:928-1788
MIRTAYDEVDKKEEVEKINLDQLSQGDIINLLKNFRDLNTDEQD
>EN_comp12741_c2_seq4:2-406
KHQIKQLTVQLPKEGQPDSGLTKDYTSSPLHRFKKPGSKNYQNIYPPSST
>ES_comp17836_c2_seq2:2-862
RKMTSPTSSVIRAAVLQLSVSTDKSTNIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp14617_c0_seq1:111-608
MSCRYVPEANMTACGTDYSTLAWHSRSYVLVYAMFAYYLPLLVIIYAYYFIV
>ES_comp17031_c0_seq3:3-1238
QLLAGVVKRSLVNATMFSIRNIEKLMQLAPKFIPTSSMLNSSTTSIPVSTPI
>ES_comp17168_c1_seq6:1-864
IFELTVVVSFAGSRLAMFIGACCYTMFLVSFLWPTTFLLYFMSAVIGFGASVIWT

Desired output (same order as file1):

>EN_comp12952_c0_seq3:367-1668
MDKRLLNVSLLGLAFMFVFTAFQTMGNIEKTILKSIQNDYPSFTGDGYTSL
>ES_comp17168_c1_seq6:1-864
IFELTVVVSFAGSRLAMFIGACCYTMFLVSFLWPTTFLLYFMSAVIGFGASVIWT
>EN_comp13395_c3_seq14:231-1088
KMTSPTSSVIRAAVLQLSVSTDKSANIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17836_c2_seq2:2-862
RKMTSPTSSVIRAAVLQLSVSTDKSTNIAIAVKRIQQAKSNGCTLAVLPECFTTPY

The code I am currently using gives me the right sequences, but not in the right order.

awk 'NR==FNR{a[$0]=1;next} {n=0;for(i in a){if($0~i){n=1}}} n {print;getline;print}' file1 file2

Current (wrong order) output:

>EN_comp12952_c0_seq3:367-1668
MDKRLLNVSLLGLAFMFVFTAFQTMGNIEKTILKSIQNDYPSFTGDGYTSL
>EN_comp13395_c3_seq14:231-1088
KMTSPTSSVIRAAVLQLSVSTDKSANIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17836_c2_seq2:2-862
RKMTSPTSSVIRAAVLQLSVSTDKSTNIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17168_c1_seq6:1-864
IFELTVVVSFAGSRLAMFIGACCYTMFLVSFLWPTTFLLYFMSAVIGFGASVIWT

Thanks for any pointers!

Hello,

Following may help you in same.

$ awk 'NR==FNR{a[">"$0];} ($1 in a) {print $0;getline;print $0}' file1 file2

Output will be as follows.

>EN_comp12952_c0_seq3:367-1668
MDKRLLNVSLLGLAFMFVFTAFQTMGNIEKTILKSIQNDYPSFTGDGYTSL
>EN_comp13395_c3_seq14:231-1088
KMTSPTSSVIRAAVLQLSVSTDKSANIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17836_c2_seq2:2-862
RKMTSPTSSVIRAAVLQLSVSTDKSTNIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17168_c1_seq6:1-864
IFELTVVVSFAGSRLAMFIGACCYTMFLVSFLWPTTFLLYFMSAVIGFGASVIWT

Thanks,
R. Singh

Thank you for taking time on this. I actually would like the following output, so that the output sequences are in the same order as file1 (sorry, might have been confusing to also post the undesired outcome):

>EN_comp12952_c0_seq3:367-1668
MDKRLLNVSLLGLAFMFVFTAFQTMGNIEKTILKSIQNDYPSFTGDGYTSL
>ES_comp17168_c1_seq6:1-864
IFELTVVVSFAGSRLAMFIGACCYTMFLVSFLWPTTFLLYFMSAVIGFGASVIWT
>EN_comp13395_c3_seq14:231-1088
KMTSPTSSVIRAAVLQLSVSTDKSANIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17836_c2_seq2:2-862
RKMTSPTSSVIRAAVLQLSVSTDKSTNIAIAVKRIQQAKSNGCTLAVLPECFTTPY
awk 'NR==FNR{getline B; A[substr($0,2)]=B; next} $0 in A{ $0=">"$0"\n"A[$0] } 1' file2 file1
awk 'NR==FNR{if(/^>/){key=$0} else {a[key]=$0};next}
{if (a[">"$0]) { print ">"$0;print a[">"$0]}}' file2 file1
1 Like
awk 'NR==FNR{B=$0;getline;A=$0;next} {D=">"$0; print D;print A[D]}' file2 file1

>EN_comp12952_c0_seq3:367-1668
MDKRLLNVSLLGLAFMFVFTAFQTMGNIEKTILKSIQNDYPSFTGDGYTSL
>ES_comp17168_c1_seq6:1-864
IFELTVVVSFAGSRLAMFIGACCYTMFLVSFLWPTTFLLYFMSAVIGFGASVIWT
>EN_comp13395_c3_seq14:231-1088
KMTSPTSSVIRAAVLQLSVSTDKSANIAIAVKRIQQAKSNGCTLAVLPECFTTPY
>ES_comp17836_c2_seq2:2-862
RKMTSPTSSVIRAAVLQLSVSTDKSTNIAIAVKRIQQAKSNGCTLAVLPECFTTPY