Hi all!
I would like to solve a problem but I have no clue of how do it!I will be grateful if someone could help me!
Briefly I have a big file like this:
>ENSMUSG00000000204 | ENSMUST00000159637
GGCGAGGCTTACGCCATTTTACCTCAGCGAGCATTCATAAAGCTGCGAGCATTCATACAG
>ENSMUSG00000000204 | ENSMUST00000457457
CTGTCCTCTTTCCATGTGCTGTGTGAACCTGGCATGCTGGCTTAGGACATGGCCTGATTC
>ENSMUSG00000006755 | ENSMUST00000457688
GGCGAGGCTTACGCCATTTTGTGTGAACCTGGCATGCTGGCTTAGGACATGGCCTGATTC
>ENSMUSG00000037965 | ENSMUST00000068577
GGCGAGGCTTACGCCATTTTGTGTGAACCTGGCATGCTGGCTTAGGACATGGCCTGATTC
>ENSMUSG00000002323 | ENSMUST00000777544
GGCGAGGCTTACGCCATTTTGTGTGAACCTGGCATGCTGGCTTAGGACATGGCCTGATTC
and another file like this (it contains all the ID I want obtain freom the previous file!):
ENSMUSG00000000204
ENSMUSG00000002323
My desired output should be like that (the selected ID and the relative sequence:
>ENSMUSG00000000204 | ENSMUST00000159637
GGCGAGGCTTACGCCATTTTACCTCAGCGAGCATTCATAAAGCTGCGAGCATTCATACAG
>ENSMUSG00000000204 | ENSMUST00000457457
CTGTCCTCTTTCCATGTGCTGTGTGAACCTGGCATGCTGGCTTAGGACATGGCCTGATTC
>ENSMUSG00000002323 | ENSMUST00000777544
GGCGAGGCTTACGCCATTTTGTGTGAACCTGGCATGCTGGCTTAGGACATGGCCTGATTC
Do you know how can I solve this problem?!
Thank you very much for you support!
$ cat file1
>ENSMUSG00000000204 | ENSMUST00000159637
GGCGAGGCTTACGCCATTTTACCTCAGCGAGCATTCATAAAGCTGCGAGCATTCATACAG
>ENSMUSG00000000204 | ENSMUST00000457457
CTGTCCTCTTTCCATGTGCTGTGTGAACCTGGCATGCTGGCTTAGGACATGGCCTGATTC
>ENSMUSG00000006755 | ENSMUST00000457688
GGCGAGGCTTACGCCATTTTGTGTGAACCTGGCATGCTGGCTTAGGACATGGCCTGATTC
>ENSMUSG00000037965 | ENSMUST00000068577
GGCGAGGCTTACGCCATTTTGTGTGAACCTGGCATGCTGGCTTAGGACATGGCCTGATTC
>ENSMUSG00000002323 | ENSMUST00000777544
GGCGAGGCTTACGCCATTTTGTGTGAACCTGGCATGCTGGCTTAGGACATGGCCTGATTC
$ cat file2
ENSMUSG00000000204
ENSMUSG00000002323
$ awk 'FNR==NR{A[">"$1];next}/>/{f=($1 in A)}f' FS="[ |]" file2 file1
Resulting
>ENSMUSG00000000204 | ENSMUST00000159637
GGCGAGGCTTACGCCATTTTACCTCAGCGAGCATTCATAAAGCTGCGAGCATTCATACAG
>ENSMUSG00000000204 | ENSMUST00000457457
CTGTCCTCTTTCCATGTGCTGTGTGAACCTGGCATGCTGGCTTAGGACATGGCCTGATTC
>ENSMUSG00000002323 | ENSMUST00000777544
GGCGAGGCTTACGCCATTTTGTGTGAACCTGGCATGCTGGCTTAGGACATGGCCTGATTC
Hello,
Following may help too.
awk -F" |" 'NR==FNR {a[">"$1];next} ($1 in a){print $0;getline; print $0}' file2 file1
Output will be as follows.
>ENSMUSG00000000204 | ENSMUST00000159637
GGCGAGGCTTACGCCATTTTACCTCAGCGAGCATTCATAAAGCTGCGAGCATTCATACAG
>ENSMUSG00000000204 | ENSMUST00000457457
CTGTCCTCTTTCCATGTGCTGTGTGAACCTGGCATGCTGGCTTAGGACATGGCCTGATTC
>ENSMUSG00000002323 | ENSMUST00000777544
GGCGAGGCTTACGCCATTTTGTGTGAACCTGGCATGCTGGCTTAGGACATGGCCTGATTC
Thanks,
R. Singh