I have two files.
File 1 is a two-column index file, e.g.
comp11084_c0_seq6:130-468(-) comp12746_c0_seq3:140-478(+)
comp11084_c0_seq3:201-539(-) comp12746_c0_seq2:191-529(+)
File 2 is a sequence file with headers named with the same terms that populate file 1.
>comp11084_c0_seq6:130-468(-)
MRYVAAYLLASLSGKEPSSDEVEKILSSVGIESDSSKLSLVIKELKGKNVDEVIESGRSKLAS
>comp12746_c0_seq3:140-478(+)
MRYVAAYLLASLSGKEPSSDEVEKILSSVGIESDSSKLSLVIKELKGKNVDEVIESGRSKLAS
>comp11084_c0_seq3:201-539(-)
MRYVAAYLLASFSGKEPTSDEIEKILSSVGIESDSDKVSLVVKELKGKNVDEVIESGRSKLAS
>comp12746_c0_seq2:191-529(+)
MRYVAAYLLASFSGKEPTSDEIEKILSSVGIESDSDKVSLVVKELKGKNVDEVIESGRSKLAS
>comp11084_c0_seq3:201-539(-)
MSDTSNVNRLEELGKMKVNDLKKELKARGLSTVGNKQELIDRMINHSESSVLDIEDTVLDE
>comp12601_c0_seq4:132-965(-)
MSDTSNVNRLEELGKMKVNDLKKELKARGLSTVGNKQELIDRMINHSESSVLDIEDTVLDE
All pairs of terms in file 1 "head" a pair of sequences in file 2. These are the pairs of sequences I want to extract. File 2 also has sequence pairs with headers not found in as pairs in file 1 (e.g. the third sequence in this example), which I want to exclude.
Output:
>comp11084_c0_seq6:130-468(-)
MRYVAAYLLASLSGKEPSSDEVEKILSSVGIESDSSKLSLVIKELKGKNVDEVIESGRSKLAS
>comp12746_c0_seq3:140-478(+)
MRYVAAYLLASLSGKEPSSDEVEKILSSVGIESDSSKLSLVIKELKGKNVDEVIESGRSKLAS
>comp11084_c0_seq3:201-539(-)
MRYVAAYLLASFSGKEPTSDEIEKILSSVGIESDSDKVSLVVKELKGKNVDEVIESGRSKLAS
>comp12746_c0_seq2:191-529(+)
MRYVAAYLLASFSGKEPTSDEIEKILSSVGIESDSDKVSLVVKELKGKNVDEVIESGRSKLAS
I can print lines that match in two files
awk ' NR == FNR { arr[$1$2]=1; next } arr[$2$1] {print $1, $2} '
But I don't know how to deal with matching one line in file 1 to multiple lines in file2.
Help me out?