Hi guys,
Can you help me in solving ths problem?
I have two files file1 and file2 as following:
===FILE1====
>LOC21
MASSKFCTVLSLALFLVLLTHANSAELFSFNFQTFNAANLILQGNASVSSSGQLRLTEVKSNGEPKVASL
VASFATAFTFNILAPILSNSADGLAFALVPVGSQPKFNGGFLGLFQNVTYDP
>LOC05
MASSKFSTVLSLALFLVLLTHANSAELFSFNFQTFNAANLILQGNASVSSSGQLRLTEVKSNGEPKVASL
GRAFYSAPIQIWDSTTGKVASFATAFTFNILAPILSNSADGLAFALVPVGSQPKFNGGFLGLFQNVTYDP
AKVLITYDSSTKLLVASLVYPSGS
>LOC48
MASLQTQMISFYAIFLSILLTTILFFKVNSTGEITSFSIPKFRPDQPNLIFQGGGYTTKEKLTLTKAVK
====FILE2====
LOC21
LOC48
I want to write the complete record form FILE1 (which starts from '>' sign) which matches the pattern in FILE2 into a new file FILE3 which shold look like -
>LOC21
MASSKFCTVLSLALFLVLLTHANSAELFSFNFQTFNAANLILQGNASVSSSGQLRLTEVKSNGEPKVASL
VASFATAFTFNILAPILSNSADGLAFALVPVGSQPKFNGGFLGLFQNVTYDP
>LOC48
MASLQTQMISFYAIFLSILLTTILFFKVNSTGEITSFSIPKFRPDQPNLIFQGGGYTTKEKLTLTKAVK
Thanks for your help.. the code is running perfect but i hv one more problem.
actually the line begining with '>' contain other words also and i have different files in which LOC can be smthn els like ABC or GNL but the first three letters after '>' will be same. I solved that by replacing the line
/^>LOC/ {
with
/^>/ {
my file is like this..
>LOC21 ths is a seq of protein bla-bla-bla
MASSKFCTVLSLALFLVLLTHANSAELFSFNFQTFNAANLILQGNASVSSSGQLRLTEVKSNGEPKVASL
VASFATAFTFNILAPILSNSADGLAFALVPVGSQPKFNGGFLGLFQNVTYDP
so whn i tried it on my actual file it could't work as far as i understood words with spaces in header line(begining with '>') is causing a trouble.
I will be thankful if you can help me to solve this out.
Its working now although I hd put the order of files correctly before as well.
Actually i tried to run it as single line on command line. I think it shudn't make any
difference.
But anyways its working fine nw and it solved my other problem also as it works even if my header line (the one begining with '>' ) contain more words.
I tried the same script on a little modified file where headre line is
>LOC_Os01g57570.1|12001.m11908|protein minor allergen Alt a 7, putative, [expressed]
MAVKVYVVYYSMYGHVAKLAEEIKKGASSIEGVEAKIWQVPETLHEEVLGKMGAPPKPDV
PTITPQELTEADGILFGFP
===FILE1===
>LOC_Os01g57570.1|12001.m11908|protein minor allergen Alt a 7, [expressed]
MAVKVYVVYYSMYGHVAKLAEEIKKGASSIEGVEAKIWQVPETLHEEVLGKMGAPPKPDV
PTITPQELTEADGILFGFP
>LOC_Os01g57640.1|12001.m11908|protein lectin 7, (putative), expressed
MAVKVYVVYYSMYGHVAKLAEEIKKGASSIEGVEAKIWQVPETLHEEVLGKMGAPPKPDV
PTITPQELTEADGILFGFPTRFGMMAAQMKAFFDATGGLWSEQSLAGKPAGIFFS
>LOC_Os01g57000.2|12001.m43222|protein minor allergen Alt a 7
MAVKVYVVYYSMYGHVAKLAEEIKKGASSIEGVEAKIWQVPETLHEEVLGKMGAPPKPDV
PTITPQELTEADGILFGFPTRFGMMAAQMKAFFDATGGLWSEQSL
====FILE2====
LOC_Os01g57570
LOC_Os01g57000
and ths LOC can be any three letters such as ABC or GNL but they will be same in every header (line with a '>' symbol)