shen
February 23, 2011, 2:47am
21
It works perfectly,Thank you.I have one more question.
I have fatsa file something like this
>POPTR_1446s00200.20|PACid:18205974
KGIAEIKNRLQNRKVLVILDDVDNLKQLHFLAVDWKWFLPGSRIIITSRDKNLLSTHAVDGIYEAEELNDDDALVLLSRK
AFKKDQPIEGYWELCKSVLGHARGLPLAARVLGSSLCGRSMDFWESFIKRLNEIPNRDVMAVLKLSFDGLEELEKKIFLD
IACFFKGMNKDQVSRILNQCGFHANYGIQILQDKSLICVSNDTLSMHDLLQAMGREVVRQESTAEPGRRSRLWANGSLTW
YQSLDDQAGTEEIESIALDWPNPEDVEGTMLKTKRSAWNTGVFSKMSRLRLLRIRNACFDSGPEYLSNELRLILKGCRRL
SEVHSSIGHHNKLIYVNLIDCKSLTSLPSRISGLKLLEELHLSGCSKLKEFPEIVGNKKCLRKLCLDQTSIEELPLSIQY
LVGLISLSLKDCKKLARLPSSINGLKSLKTLHLSGCSELDNLPENLGQLECLNELDVRAVPNDIGYLSSLRHLDLSCNKF
VSLPTSIDQLSGLQFLRMEDCKMLQSLPELPSNLEEFRGPPNLIESFSVIIPGSEIPTWFSHQSEGSSVSVQTPPHSHEN
DEWLGYAVCASLGYPDFPPNVFRSPMQCFFNGDGNESESIYVRLKPCEILSDHLWFLYFPSRFKRFDRHVRFRFEDNCSQ
TKVIKCGVRLVYQQDVEELNRMTNLYENSTFEGVDECFQESGGALVKRLGHTNDVGEASGSVSSDEQPPTKKLKQI*
>POPTR_1855s00205.1|PACid:18205975
MVMPRLSWRWLLAVSCLPAFALLLFYSHVPESPRYLCMKGRINDAYNILEKIALLNQSKLPPGELVPDSTIGLDEESATS
EYTPLLSTTEKMDLDFRSGFQSFLMLFSSKVIRTTLLLWELLFGNVFSYYAIILLTSELSSWQSRCGSNLLKSENPDSLY
INVFISNLAGIACLLSSSLCN*
>POPTR_1855s00200.1|PACid:18205976
MLSHNFCPLVSTGMNMTQTAEIAIRTSNRMMASVIWQLVTRPTATRGHIMPPTFPAAFAAPAPVALTDVGYNCWGFRLPM
RGPVVFYVVMSNRLSSDDPREFFPLASGLTGSVIDVHYYNLFSDEFNSMSVQQNIDFINTNRLVTADYAGDNKWGDDDP
SVFVMTIAGRLQGEFQVTNGYGPKLAPKVMRDHWRTFIVEDDFKFISQNGINAVRIPVGWWIASDPTPPQPYVGGSLKA
>POPTR_0005s16590.4|PACid:18205977
FKINAVNLGGWLVTEGWIKPSLFDGITNKDFLDGTGLQFKSVTVGKYLCAEAGGGNIIVANRTSASGWETFSLWRINETN
FNFRVFNKQFAGLDTNGNGIDIVAVSSTPGRSETFEIVRNSNDTSRVRIKASNGFFLQAKTEELVTADYAGDNKWGDDDP
SVFVMTIAGRLQGEFQVTNGYGPKLAPKVMRDHWRTFIVEDDFKFISQNGINAVRIPVGWWIASDPTPPQPYVGGSLKAL
DNAFLWAQNYGLQVVIDLHAAPGSQNGWEHSSSRDGSQEWGQTDENIRQTVDVIDFLTARYAKSPSLYAVELMNEPRAPG
ASLDSMTKYYKGGYDAVRKHSSTAYVVMSNRLSSDDPREFFPLASGLTGSVIDVHYYNLFSDEFNSMSVQQNIDFINTNR
SGQLNYVTTSNGPLTFVGEWVAEWTVQGATKEDYQRFAEAQLKVFGRATFGWAYWTLKNVNNHWSLEWMIKNGYIKI*
I just want to extract following out put "*.1" results only
>POPTR_1855s00200.1|PACid:18205976
MLSHNFCPLVSTGMNMTQTAEIAIRTSNRMMASVIWQLVTRPTATRGHIMPPTFPAAFAAPAPVALTDVGYNCWGFRLPM
RGPVVFYVVMSNRLSSDDPREFFPLASGLTGSVIDVHYYNLFSDEFNSMSVQQNIDFINTNRLVTADYAGDNKWGDDDP
SVFVMTIAGRLQGEFQVTNGYGPKLAPKVMRDHWRTFIVEDDFKFISQNGINAVRIPVGWWIASDPTPPQPYVGGSLKA
Can you help me to do that.
ctsgnb
February 23, 2011, 3:12am
22
if your entries are 1 long line (so that it is displayed on multi lines but containing no "\n", this should do the work (assuming you don't want *.10 or *.11 entries ) :
grep POPTR_1855s00200\.1\| infile
nawk -v RS=">" -v FS="" '/POPTR_1855s00200\.1\|/' infile
or
nawk -v RS=">" -v FS="" '/POPTR_1855s00200\.1\|/{printf "%s",$0}' infile
perl is also powerful!!
cat ufile|perl -p -ne 's/(?:(?<==)PAC)/PT/;'
shen
February 23, 2011, 8:34am
24
but your code is not working POPTR_1855s00201 this file contains set of different values.If I use it just return following
POPTR_1855s00200.1|PACid:18205976
ATGCTGTCGCACAATTTCTGTCCCTTGGTATCAACTGGGATGAACATGACACAGACTGCTGAAATAGCCATCAGGACCTC
AAACAGGATGATGGCTTCTGTAATCTGGCAGCTAGTCACCAGGCCAACTGCCACAAGAGGACATATCATGCCACCAACCT
TCCCCGCAGCATTTGCAGCTCCAGCACCAGTTGCCCTGACGGATGTCGGATATAACTGTTGGGGATTCCGACTCCCCATG
CGCGGACCGGTGGTGTTCTGA
I need to get all the paragraphs which contains *.1 values.
ctsgnb
February 23, 2011, 8:39am
25
Then just remove POPTR_1855s00200 and give a try to :
grep \.1\| infile
nawk -v RS=">" -v FS="" '/\.1\|/' infile
nawk -v RS=">" -v FS="" '/\.1\|/{printf "%s",$0}' infile
shen
February 23, 2011, 8:46am
26
hi justlooks your code output contains
>POPTR_0005s00710.3|PACid:18208427
ACAAAACAAAAACAAAAGAGAAAGAGAGAAAGAAAGCACTTTCTTTTGTGTATTGTATTGGAAGGCAGCTTGTTGTGTTA
TTGCCCACAAAAATGGATCTGTCTTCCCCACCTTCTCTCTCCACTGTCTGATCCTCAGCTCCTCTTCCTCATCTCCACCT
TCAAAACGGCATCTTCTTTAATACCCAGGATTTTTTCCGGCTTAATTGGTAATGTGTAATTGCATTGAAGTTCAAGAGTA
AGACAGAATAATTTGTTCTTGTCTAACAAATGGGTTCTGTGGGCGTGGCACCTTCTTCGGGATTAAGAGAAGCTAGTGCC
CATAATGCTGGTGTGGATAAGTTACCTGAGGAAATGAATGACATGAAAATTAGAGATGACAAAGAAATGGAGGCAACAGT
values ,I just wanted to extract *.1 paragraphs.
---------- Post updated at 08:46 AM ---------- Previous update was at 08:39 AM ----------
great,This means I can use any one of the three functions.
Try this:
awk 'f && /^>POP/{f=0} /^>POP.*\.1\|/{f=1}f' file
ctsgnb
February 23, 2011, 8:49am
28
my code output does not contain *.3 stuff, only *.1 :
(i based my test on the sample you gave)
# cat tst
>POPTR_1446s00200.20|PACid:18205974
KGIAEIKNRLQNRKVLVILDDVDNLKQLHFLAVDWKWFLPGSRIIITSRDKNLLSTHAVDGIYEAEELNDDDALVLLSRK
AFKKDQPIEGYWELCKSVLGHARGLPLAARVLGSSLCGRSMDFWESFIKRLNEIPNRDVMAVLKLSFDGLEELEKKIFLD
IACFFKGMNKDQVSRILNQCGFHANYGIQILQDKSLICVSNDTLSMHDLLQAMGREVVRQESTAEPGRRSRLWANGSLTW
YQSLDDQAGTEEIESIALDWPNPEDVEGTMLKTKRSAWNTGVFSKMSRLRLLRIRNACFDSGPEYLSNELRLILKGCRRL
SEVHSSIGHHNKLIYVNLIDCKSLTSLPSRISGLKLLEELHLSGCSKLKEFPEIVGNKKCLRKLCLDQTSIEELPLSIQY
LVGLISLSLKDCKKLARLPSSINGLKSLKTLHLSGCSELDNLPENLGQLECLNELDVRAVPNDIGYLSSLRHLDLSCNKF
VSLPTSIDQLSGLQFLRMEDCKMLQSLPELPSNLEEFRGPPNLIESFSVIIPGSEIPTWFSHQSEGSSVSVQTPPHSHEN
DEWLGYAVCASLGYPDFPPNVFRSPMQCFFNGDGNESESIYVRLKPCEILSDHLWFLYFPSRFKRFDRHVRFRFEDNCSQ
TKVIKCGVRLVYQQDVEELNRMTNLYENSTFEGVDECFQESGGALVKRLGHTNDVGEASGSVSSDEQPPTKKLKQI*
>POPTR_1855s00205.1|PACid:18205975
MVMPRLSWRWLLAVSCLPAFALLLFYSHVPESPRYLCMKGRINDAYNILEKIALLNQSKLPPGELVPDSTIGLDEESATS
EYTPLLSTTEKMDLDFRSGFQSFLMLFSSKVIRTTLLLWELLFGNVFSYYAIILLTSELSSWQSRCGSNLLKSENPDSLY
INVFISNLAGIACLLSSSLCN*
>POPTR_1855s00200.1|PACid:18205976
MLSHNFCPLVSTGMNMTQTAEIAIRTSNRMMASVIWQLVTRPTATRGHIMPPTFPAAFAAPAPVALTDVGYNCWGFRLPM
RGPVVFYVVMSNRLSSDDPREFFPLASGLTGSVIDVHYYNLFSDEFNSMSVQQNIDFINTNRLVTADYAGDNKWGDDDP
SVFVMTIAGRLQGEFQVTNGYGPKLAPKVMRDHWRTFIVEDDFKFISQNGINAVRIPVGWWIASDPTPPQPYVGGSLKA
>POPTR_0005s16590.4|PACid:18205977
FKINAVNLGGWLVTEGWIKPSLFDGITNKDFLDGTGLQFKSVTVGKYLCAEAGGGNIIVANRTSASGWETFSLWRINETN
FNFRVFNKQFAGLDTNGNGIDIVAVSSTPGRSETFEIVRNSNDTSRVRIKASNGFFLQAKTEELVTADYAGDNKWGDDDP
SVFVMTIAGRLQGEFQVTNGYGPKLAPKVMRDHWRTFIVEDDFKFISQNGINAVRIPVGWWIASDPTPPQPYVGGSLKAL
DNAFLWAQNYGLQVVIDLHAAPGSQNGWEHSSSRDGSQEWGQTDENIRQTVDVIDFLTARYAKSPSLYAVELMNEPRAPG
ASLDSMTKYYKGGYDAVRKHSSTAYVVMSNRLSSDDPREFFPLASGLTGSVIDVHYYNLFSDEFNSMSVQQNIDFINTNR
SGQLNYVTTSNGPLTFVGEWVAEWTVQGATKEDYQRFAEAQLKVFGRATFGWAYWTLKNVNNHWSLEWMIKNGYIKI*
# nawk -v RS=">" -v FS="" '/\.1\|/' tst
POPTR_1855s00205.1|PACid:18205975
MVMPRLSWRWLLAVSCLPAFALLLFYSHVPESPRYLCMKGRINDAYNILEKIALLNQSKLPPGELVPDSTIGLDEESATS
EYTPLLSTTEKMDLDFRSGFQSFLMLFSSKVIRTTLLLWELLFGNVFSYYAIILLTSELSSWQSRCGSNLLKSENPDSLY
INVFISNLAGIACLLSSSLCN*
POPTR_1855s00200.1|PACid:18205976
MLSHNFCPLVSTGMNMTQTAEIAIRTSNRMMASVIWQLVTRPTATRGHIMPPTFPAAFAAPAPVALTDVGYNCWGFRLPM
RGPVVFYVVMSNRLSSDDPREFFPLASGLTGSVIDVHYYNLFSDEFNSMSVQQNIDFINTNRLVTADYAGDNKWGDDDP
SVFVMTIAGRLQGEFQVTNGYGPKLAPKVMRDHWRTFIVEDDFKFISQNGINAVRIPVGWWIASDPTPPQPYVGGSLKA
# nawk -v RS=">" -v FS="" '/\.1\|/{printf "%s",$0}' tst
POPTR_1855s00205.1|PACid:18205975
MVMPRLSWRWLLAVSCLPAFALLLFYSHVPESPRYLCMKGRINDAYNILEKIALLNQSKLPPGELVPDSTIGLDEESATS
EYTPLLSTTEKMDLDFRSGFQSFLMLFSSKVIRTTLLLWELLFGNVFSYYAIILLTSELSSWQSRCGSNLLKSENPDSLY
INVFISNLAGIACLLSSSLCN*
POPTR_1855s00200.1|PACid:18205976
MLSHNFCPLVSTGMNMTQTAEIAIRTSNRMMASVIWQLVTRPTATRGHIMPPTFPAAFAAPAPVALTDVGYNCWGFRLPM
RGPVVFYVVMSNRLSSDDPREFFPLASGLTGSVIDVHYYNLFSDEFNSMSVQQNIDFINTNRLVTADYAGDNKWGDDDP
SVFVMTIAGRLQGEFQVTNGYGPKLAPKVMRDHWRTFIVEDDFKFISQNGINAVRIPVGWWIASDPTPPQPYVGGSLKA
shen
February 23, 2011, 8:56am
29
perfect,Thank you it works fine
---------- Post updated at 08:56 AM ---------- Previous update was at 08:55 AM ----------
Great It works nicely,I meant "justlooks" perl code it gave me *.3 out not yours.
Thank you very much.
ctsgnb
February 23, 2011, 9:04am
30
You might want to keep the leading ">" :
nawk -v RS=">" -v FS="" '/\.1\|/{printf RS"%s",$0}' infile