shen
February 20, 2011, 3:41am
1
I have following text
scaffold_1 phytozome6 gene 12632 13612 . + . ID=PT_0001s00200;Name=PT_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=PAC:18235173;Name=PT_0001s00200.1;PACid=18235173;Parent=PT_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=PAC:18235173;PACid=18235173
I want convert into
scaffold_1 phytozome6 gene 12632 13612 . + . ID=PT_0001s00200;Name=PT_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=PT_0001s00200.1;Name=PT_0001s00200.1;PACid=18235173;Parent=PT_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=PT_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=PT_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=PT_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=PT_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=PT_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=PT_0001s00200.1;PACid=18235173
I tried with following script
awk '{if(substr($9,17,10)=="Name=PT") print gensub("Parent=PAC:"substr($9,47,9),"Parent="substr($9,22,16)".1",
substr($0,0)) substr($0,0) }' inputfile > outputfile
but still no luck.
Can you please help me,
why not use sed?
sed 's/Parent=PAC:18235173;PACid=18235173/Parent=PT_0001s00200.1;PACid=18235173/g'
You can do an inline edit to the same file using the -i switch. Or redirect it to another file like your example shows
shen
February 20, 2011, 4:50am
3
I have used following script but still I cant replace the Parent=PAC
awk '{if(substr($9,17,10)=="Name=PT") n=substr($9,22,16) gsub("Parent=PAC:"substr($9,47,9),"Parent="n".1");
gsub("ID=PAC:"substr($9,8,19),"ID="n".1"); }' input>output
any help appreciated
---------- Post updated at 04:50 AM ---------- Previous update was at 04:41 AM ----------
Thank you for your reply,but this is only one record there are set records with unique PAC and PT ids ,In this case I have to use your solution 300000 times with changing PAC and PT ids,cant we use
if(substr($9,17,10)=="Name=PT")
with your solution.
rdcwayx
February 20, 2011, 5:44am
4
awk -F"[=;]" 'NR==1{s=$2 ".1";print;FS=OFS=";";next}{sub(/=.*/,"="s,$1)}1' infile
shen
February 20, 2011, 6:24am
5
Thank you for your reply but I get following
scaffold_1 phytozome6 gene 2330052 2335284 . - . ID=.1;Name=PT_0001s02940
scaffold_1 phytozome6 mRNA 2330052 2335284 . - . ID=.1;Name=PT_0001s02940.1;PACid=18235154;Parent=PT_0001s02940
scaffold_1 phytozome6 CDS 2334981 2335230 . - 0 Parent=.1;PACid=18235154
scaffold_1 phytozome6 5'-UTR 2335231 2335284 . - . Parent=.1;PACid=18235154
scaffold_1 phytozome6 CDS 2334079 2334206 . - 2 Parent=.1;PACid=18235154
scaffold_1 phytozome6 CDS 2333907 2333978 . - 0 Parent=.1;PACid=18235154
scaffold_1 phytozome6 CDS 2333635 2333780 . - 0 Parent=.1;PACid=18235154
scaffold_1 phytozome6 CDS 2333448 2333562 . - 1 Parent=.1;PACid=18235154
scaffold_1 phytozome6 CDS 2333285 2333365 . - 0 Parent=.1;PACid=18235154
scaffold_1 phytozome6 CDS 2332541 2332678 . - 0 Parent=.1;PACid=18235154
scaffold_1 phytozome6 CDS 2331826 2331913 . - 0 Parent=.1;PACid=18235154
scaffold_1 phytozome6 CDS 2330651 2330764 . - 2 Parent=.1;PACid=18235154
scaffold_1 phytozome6 3'-UTR 2330052 2330460 . - . Parent=.1;PACid=18235154
scaffold_1 phytozome6 CDS 2330461 2330483 . - 2 Parent=.1;PACid=18235154
still no luck,
I used following command it will give the correct output,Did I do correct?
awk '{if(substr($9,1,5)=="ID=PT_") n=substr($9,4,16)".1"; gsub("ID=PAC:"substr($9,8,8),"ID="n);
gsub("Parent=PAC:"substr($9,12,8),"Parent="n); print; }' infile > outfile
rdcwayx
February 20, 2011, 6:42am
6
is your document's first line different as the sample? I get the ID number from the first line.
The first line should be:
scaffold_1 phytozome6 gene 12632 13612 . + . ID=PT_0001s00200;Name=PT_0001s00200
shen
February 20, 2011, 6:48am
7
yes It is different
scaffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=PAC:18235173;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=PAC:18235173;PACid=18235173
change it to
caffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=POPTR_0001s00200.1;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=POPTR_0001s00200.1;PACid=18235173
Do you think your function will help.sorry for changing format
rdcwayx
February 20, 2011, 7:00am
8
No change, Still fine.
$ cat infile
scaffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=PAC:18235173;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=PAC:18235173;PACid=18235173
$ awk -F"[=;]" 'NR==1{s=$2 ".1";print;FS=";";OFS=";";next}{sub(/=.*/,"="s,$1)}1' infile
scaffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=POPTR_0001s00200.1;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=POPTR_0001s00200.1;PACid=18235173
Is your awk support -F"[=;]"
shen
February 20, 2011, 7:13am
9
Yes It supports FS,Thank you for your help but It I want unique sub sets depend on the POPTR_ for example,those are gene names
scaffold_997 phytozome6 gene 1687 2351 . - . ID=POPTR_0997s00200;Name=POPTR_0997s00200
scaffold_997 phytozome6 mRNA 1687 2351 . - . ID=PAC:18226942;Name=POPTR_0997s00200.1;PACid=18226942;Parent=POPTR_0997s00200
scaffold_997 phytozome6 CDS 2240 2317 . - 0 Parent=PAC:18226942;PACid=18226942
scaffold_997 phytozome6 5'-UTR 2318 2351 . - . Parent=PAC:18226942;PACid=18226942
scaffold_997 phytozome6 CDS 2078 2111 . - 0 Parent=PAC:18226942;PACid=18226942
scaffold_997 phytozome6 3'-UTR 1687 1866 . - . Parent=PAC:18226942;PACid=18226942
scaffold_997 phytozome6 CDS 1867 1997 . - 2 Parent=PAC:18226942;PACid=18226942
changed to
scaffold_997 phytozome6 gene 1687 2351 . - . ID=POPTR_0997s00200;Name=POPTR_0997s00200
scaffold_997 phytozome6 mRNA 1687 2351 . - . ID=POPTR_0997s00200.1;Name=POPTR_0997s00200.1;PACid=18226942;Parent=POPTR_0997s00200
scaffold_997 phytozome6 CDS 2240 2317 . - 0 Parent=POPTR_0997s00200.1;PACid=18226942
scaffold_997 phytozome6 5'-UTR 2318 2351 . - . Parent=POPTR_0997s00200.1;PACid=18226942
scaffold_997 phytozome6 CDS 2078 2111 . - 0 Parent=POPTR_0997s00200.1;PACid=18226942
scaffold_997 phytozome6 3'-UTR 1687 1866 . - . Parent=POPTR_0997s00200.1;PACid=18226942
scaffold_997 phytozome6 CDS 1867 1997 . - 2 Parent=POPTR_0997s00200.1;PACid=18226942
Do you think you can adjust your function? I really like your function,instead of following one
awk '{if(substr($9,1,9)=="ID=POPTR_") n=substr($9,4,16)".1"; gsub("ID=PAC:"substr($9,8,8),"ID="n);
gsub("Parent=PAC:"substr($9,12,8),"Parent="n); print; }' input>output
rdcwayx
February 20, 2011, 6:45pm
10
I don't care what your code is doing, I write the code to export the output you want.
I try your latest input and output with my command, still perfect, I don't see any issue.
So you need point out the error what I need adjust.
shen
February 21, 2011, 1:34am
11
Here is the expected input and output,but your code not generate following output
sscaffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=PAC:18235173;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 gene 19769 22804 . + . ID=POPTR_0001s00210;Name=POPTR_0001s00210
scaffold_1 phytozome6 mRNA 19769 22804 . + . ID=PAC:18238552;Name=POPTR_0001s00210.1;PACid=18238552;Parent=POPTR_0001s00210
scaffold_1 phytozome6 5'-UTR 19769 19827 . + . Parent=PAC:18238552;PACid=18238552
scaffold_1 phytozome6 CDS 19828 20136 . + 0 Parent=PAC:18238552;PACid=18238552
scaffold_1 phytozome6 CDS 22190 22516 . + 0 Parent=PAC:18238552;PACid=18238552
scaffold_1 phytozome6 3'-UTR 22517 22804 . + . Parent=PAC:18238552;PACid=18238552
scaffold_1 phytozome6 gene 74076 75893 . + . ID=POPTR_0001s00220;Name=POPTR_0001s00220
scaffold_1 phytozome6 mRNA 74076 75893 . + . ID=PAC:18237390;Name=POPTR_0001s00220.1;PACid=18237390;Parent=POPTR_0001s00220
scaffold_1 phytozome6 CDS 74076 74235 . + 0 Parent=PAC:18237390;PACid=18237390
scaffold_1 phytozome6 CDS 74359 74634 . + 2 Parent=PAC:18237390;PACid=18237390
scaffold_1 phytozome6 CDS 75259 75893 . + 2 Parent=PAC:18237390;PACid=18237390
scaffold_1 phytozome6 gene 80191 81289 . - . ID=POPTR_0001s00230;Name=POPTR_0001s00230
scaffold_1 phytozome6 mRNA 80191 81289 . - . ID=PAC:18235601;Name=POPTR_0001s00230.1;PACid=18235601;Parent=POPTR_0001s00230
scaffold_1 phytozome6 CDS 81161 81289 . - 0 Parent=PAC:18235601;PACid=18235601
scaffold_1 phytozome6 CDS 80191 80385 . - 0 Parent=PAC:18235601;PACid=18235601
desired output
scaffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=POPTR_0001s00200.1;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 gene 19769 22804 . + . ID=POPTR_0001s00210;Name=POPTR_0001s00210
scaffold_1 phytozome6 mRNA 19769 22804 . + . ID=POPTR_0001s00210.1;Name=POPTR_0001s00210.1;PACid=18238552;Parent=POPTR_0001s00210
scaffold_1 phytozome6 5'-UTR 19769 19827 . + . Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1 phytozome6 CDS 19828 20136 . + 0 Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1 phytozome6 CDS 22190 22516 . + 0 Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1 phytozome6 3'-UTR 22517 22804 . + . Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1 phytozome6 gene 74076 75893 . + . ID=POPTR_0001s00220;Name=POPTR_0001s00220
scaffold_1 phytozome6 mRNA 74076 75893 . + . ID=POPTR_0001s00220.1;Name=POPTR_0001s00220.1;PACid=18237390;Parent=POPTR_0001s00220
scaffold_1 phytozome6 CDS 74076 74235 . + 0 Parent=POPTR_0001s00220.1;PACid=18237390
scaffold_1 phytozome6 CDS 74359 74634 . + 2 Parent=POPTR_0001s00220.1;PACid=18237390
scaffold_1 phytozome6 CDS 75259 75893 . + 2 Parent=POPTR_0001s00220.1;PACid=18237390
scaffold_1 phytozome6 gene 80191 81289 . - . ID=POPTR_0001s00230;Name=POPTR_0001s00230
scaffold_1 phytozome6 mRNA 80191 81289 . - . ID=POPTR_0001s00230.1;Name=POPTR_0001s00230.1;PACid=18235601;Parent=POPTR_0001s00230
scaffold_1 phytozome6 CDS 81161 81289 . - 0 Parent=POPTR_0001s00230.1;PACid=18235601
scaffold_1 phytozome6 CDS 80191 80385 . - 0 Parent=POPTR_0001s00230.1;PACid=18235601
I hope you can clearly see the issue now.Thank you
rdcwayx
February 21, 2011, 3:12pm
12
awk -F \; '/gene/{split($(NF-1),a,"=");print;next}{sub(/=.*/,"="a[2]".1",$1)}1 ' infile
awk -F"=|;" '/Name=/{sub($2,$4);v=$2}{$2=v}1' file
shen
February 22, 2011, 4:04am
14
Thanks rdcwayx you are really talented,I just want to remove spaces between ; because your output gave following
Parent=POPTR_0001s00230.1; PACid=18235601
---------- Post updated at 04:04 AM ---------- Previous update was at 04:01 AM ----------
Thank you yinyuemi your out put missing ;
sscaffold_1 phytozome6 gene 12632 13612 . + . ID POPTR_0001s00200 Name POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID POPTR_0001s00200.1 Name POPTR_0001s00200.1 PACid 18235173 Parent POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1 phytozome6 gene 19769 22804 . + . ID POPTR_0001s00210 Name POPTR_0001s00210
scaffold_1 phytozome6 mRNA 19769 22804 . + . ID POPTR_0001s00210.1 Name POPTR_0001s00210.1 PACid 18238552 Parent POPTR_0001s00210
scaffold_1 phytozome6 5'-UTR 19769 19827 . + . Parent POPTR_0001s00210.1 PACid 18238552
scaffold_1 phytozome6 CDS 19828 20136 . + 0 Parent POPTR_0001s00210.1 PACid 18238552
scaffold_1 phytozome6 CDS 22190 22516 . + 0 Parent POPTR_0001s00210.1 PACid 18238552
scaffold_1 phytozome6 3'-UTR 22517 22804 . + . Parent POPTR_0001s00210.1 PACid 18238552
scaffold_1 phytozome6 gene 74076 75893 . + . ID POPTR_0001s00220 Name POPTR_0001s00220
scaffold_1 phytozome6 mRNA 74076 75893 . + . ID POPTR_0001s00220.1 Name POPTR_0001s00220.1 PACid 18237390 Parent POPTR_0001s00220
scaffold_1 phytozome6 CDS 74076 74235 . + 0 Parent POPTR_0001s00220.1 PACid 18237390
scaffold_1 phytozome6 CDS 74359 74634 . + 2 Parent POPTR_0001s00220.1 PACid 18237390
scaffold_1 phytozome6 CDS 75259 75893 . + 2 Parent POPTR_0001s00220.1 PACid 18237390
scaffold_1 phytozome6 gene 80191 81289 . - . ID POPTR_0001s00230 Name POPTR_0001s00230
scaffold_1 phytozome6 mRNA 80191 81289 . - . ID POPTR_0001s00230.1 Name POPTR_0001s00230.1 PACid 18235601 Parent POPTR_0001s00230
scaffold_1 phytozome6 CDS 81161 81289 . - 0 Parent POPTR_0001s00230.1 PACid 18235601
scaffold_1 phytozome6 CDS 80191 80385 . - 0 Parent POPTR_0001s00230.1 PACid 18235601
rdcwayx
February 22, 2011, 5:01am
15
awk '/gene/{split($(NF-1),a,"=");print;next}{sub(/=.*/,"="a[2]".1",$1)}1 ' FS=\; OFS=\; infile
1 Like
ctsgnb
February 22, 2011, 5:45am
16
while IFS='=;' read a b c d e
do [[ "$b" == "$d" ]] && v=$b || b=$v
[[ ! -z $e ]] && d="$d;$e"
echo "$a=$b;$c=$d"
done <infile
# cat tst
sscaffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=PAC:18235173;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 gene 19769 22804 . + . ID=POPTR_0001s00210;Name=POPTR_0001s00210
scaffold_1 phytozome6 mRNA 19769 22804 . + . ID=PAC:18238552;Name=POPTR_0001s00210.1;PACid=18238552;Parent=POPTR_0001s00210
scaffold_1 phytozome6 5'-UTR 19769 19827 . + . Parent=PAC:18238552;PACid=18238552
scaffold_1 phytozome6 CDS 19828 20136 . + 0 Parent=PAC:18238552;PACid=18238552
scaffold_1 phytozome6 CDS 22190 22516 . + 0 Parent=PAC:18238552;PACid=18238552
scaffold_1 phytozome6 3'-UTR 22517 22804 . + . Parent=PAC:18238552;PACid=18238552
scaffold_1 phytozome6 gene 74076 75893 . + . ID=POPTR_0001s00220;Name=POPTR_0001s00220
scaffold_1 phytozome6 mRNA 74076 75893 . + . ID=PAC:18237390;Name=POPTR_0001s00220.1;PACid=18237390;Parent=POPTR_0001s00220
scaffold_1 phytozome6 CDS 74076 74235 . + 0 Parent=PAC:18237390;PACid=18237390
scaffold_1 phytozome6 CDS 74359 74634 . + 2 Parent=PAC:18237390;PACid=18237390
scaffold_1 phytozome6 CDS 75259 75893 . + 2 Parent=PAC:18237390;PACid=18237390
scaffold_1 phytozome6 gene 80191 81289 . - . ID=POPTR_0001s00230;Name=POPTR_0001s00230
scaffold_1 phytozome6 mRNA 80191 81289 . - . ID=PAC:18235601;Name=POPTR_0001s00230.1;PACid=18235601;Parent=POPTR_0001s00230
scaffold_1 phytozome6 CDS 81161 81289 . - 0 Parent=PAC:18235601;PACid=18235601
scaffold_1 phytozome6 CDS 80191 80385 . - 0 Parent=PAC:18235601;PACid=18235601
# while IFS='=;' read a b c d e; do [[ "$b" == "$d" ]] && v=$b || b=$v ; [[ ! -z $e ]] && d="$d;$e" ; echo "$a=$b;$c=$d" ; done <tst
sscaffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=POPTR_0001s00200;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=POPTR_0001s00200;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=POPTR_0001s00200;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=POPTR_0001s00200;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=POPTR_0001s00200;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=POPTR_0001s00200;PACid=18235173
scaffold_1 phytozome6 gene 19769 22804 . + . ID=POPTR_0001s00210;Name=POPTR_0001s00210
scaffold_1 phytozome6 mRNA 19769 22804 . + . ID=POPTR_0001s00210;Name=POPTR_0001s00210.1;PACid=18238552;Parent=POPTR_0001s00210
scaffold_1 phytozome6 5'-UTR 19769 19827 . + . Parent=POPTR_0001s00210;PACid=18238552
scaffold_1 phytozome6 CDS 19828 20136 . + 0 Parent=POPTR_0001s00210;PACid=18238552
scaffold_1 phytozome6 CDS 22190 22516 . + 0 Parent=POPTR_0001s00210;PACid=18238552
scaffold_1 phytozome6 3'-UTR 22517 22804 . + . Parent=POPTR_0001s00210;PACid=18238552
scaffold_1 phytozome6 gene 74076 75893 . + . ID=POPTR_0001s00220;Name=POPTR_0001s00220
scaffold_1 phytozome6 mRNA 74076 75893 . + . ID=POPTR_0001s00220;Name=POPTR_0001s00220.1;PACid=18237390;Parent=POPTR_0001s00220
scaffold_1 phytozome6 CDS 74076 74235 . + 0 Parent=POPTR_0001s00220;PACid=18237390
scaffold_1 phytozome6 CDS 74359 74634 . + 2 Parent=POPTR_0001s00220;PACid=18237390
scaffold_1 phytozome6 CDS 75259 75893 . + 2 Parent=POPTR_0001s00220;PACid=18237390
scaffold_1 phytozome6 gene 80191 81289 . - . ID=POPTR_0001s00230;Name=POPTR_0001s00230
scaffold_1 phytozome6 mRNA 80191 81289 . - . ID=POPTR_0001s00230;Name=POPTR_0001s00230.1;PACid=18235601;Parent=POPTR_0001s00230
scaffold_1 phytozome6 CDS 81161 81289 . - 0 Parent=POPTR_0001s00230;PACid=18235601
scaffold_1 phytozome6 CDS 80191 80385 . - 0 Parent=POPTR_0001s00230;PACid=18235601
#
1 Like
shen
February 22, 2011, 8:26am
17
your out put missing with .1 I need to replace POPTR_0001s00200.1,which is name= value with PAC ids.
---------- Post updated at 08:26 AM ---------- Previous update was at 08:22 AM ----------
Hi rdcwayx unfortunately I got special case I want to
sscaffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=PAC:18235173;Name=POPTR_0001s00200.2;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=PAC:18235173;PACid=18235173
so I want to get out put
sscaffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=POPTR_0001s00200.2;Name=POPTR_0001s00200.2;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=POPTR_0001s00200.2;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=POPTR_0001s00200.2;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=POPTR_0001s00200.2;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=POPTR_0001s00200.2;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=POPTR_0001s00200.2;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=POPTR_0001s00200.2;PACid=18235173
Can I get the
value nad replace with
and
this POPTR_0001s00200.2 go up to POPTR_0001s00200.12
ctsgnb
February 22, 2011, 12:55pm
18
while IFS='=;' read a b c d e; do if [[ "$b" != "$d" ]]; then [[ "$f" == 1 ]] && v=$d; f=0; b=$v; else f=1; fi; [[ ! -z $e ]] && d="$d;$e"; echo "$a=$b;$c=$d"; done <tst
# cat tst
sscaffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=PAC:18235173;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 gene 19769 22804 . + . ID=POPTR_0001s00210;Name=POPTR_0001s00210
scaffold_1 phytozome6 mRNA 19769 22804 . + . ID=PAC:18238552;Name=POPTR_0001s00210.1;PACid=18238552;Parent=POPTR_0001s00210
scaffold_1 phytozome6 5'-UTR 19769 19827 . + . Parent=PAC:18238552;PACid=18238552
scaffold_1 phytozome6 CDS 19828 20136 . + 0 Parent=PAC:18238552;PACid=18238552
scaffold_1 phytozome6 CDS 22190 22516 . + 0 Parent=PAC:18238552;PACid=18238552
scaffold_1 phytozome6 3'-UTR 22517 22804 . + . Parent=PAC:18238552;PACid=18238552
scaffold_1 phytozome6 gene 74076 75893 . + . ID=POPTR_0001s00220;Name=POPTR_0001s00220
scaffold_1 phytozome6 mRNA 74076 75893 . + . ID=PAC:18237390;Name=POPTR_0001s00220.1;PACid=18237390;Parent=POPTR_0001s00220
scaffold_1 phytozome6 CDS 74076 74235 . + 0 Parent=PAC:18237390;PACid=18237390
scaffold_1 phytozome6 CDS 74359 74634 . + 2 Parent=PAC:18237390;PACid=18237390
scaffold_1 phytozome6 CDS 75259 75893 . + 2 Parent=PAC:18237390;PACid=18237390
scaffold_1 phytozome6 gene 80191 81289 . - . ID=POPTR_0001s00230;Name=POPTR_0001s00230
scaffold_1 phytozome6 mRNA 80191 81289 . - . ID=PAC:18235601;Name=POPTR_0001s00230.1;PACid=18235601;Parent=POPTR_0001s00230
scaffold_1 phytozome6 CDS 81161 81289 . - 0 Parent=PAC:18235601;PACid=18235601
scaffold_1 phytozome6 CDS 80191 80385 . - 0 Parent=PAC:18235601;PACid=18235601
#
# while IFS='=;' read a b c d e; do if [[ "$b" != "$d" ]]; then [[ "$f" == 1 ]] && v=$d; f=0; b=$v; else f=1; fi; [[ ! -z $e ]] && d="$d;$e"; echo "$a=$b;$c=$d"; done <tst
sscaffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=POPTR_0001s00200.1;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 gene 19769 22804 . + . ID=POPTR_0001s00210;Name=POPTR_0001s00210
scaffold_1 phytozome6 mRNA 19769 22804 . + . ID=POPTR_0001s00210.1;Name=POPTR_0001s00210.1;PACid=18238552;Parent=POPTR_0001s00210
scaffold_1 phytozome6 5'-UTR 19769 19827 . + . Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1 phytozome6 CDS 19828 20136 . + 0 Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1 phytozome6 CDS 22190 22516 . + 0 Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1 phytozome6 3'-UTR 22517 22804 . + . Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1 phytozome6 gene 74076 75893 . + . ID=POPTR_0001s00220;Name=POPTR_0001s00220
scaffold_1 phytozome6 mRNA 74076 75893 . + . ID=POPTR_0001s00220.1;Name=POPTR_0001s00220.1;PACid=18237390;Parent=POPTR_0001s00220
scaffold_1 phytozome6 CDS 74076 74235 . + 0 Parent=POPTR_0001s00220.1;PACid=18237390
scaffold_1 phytozome6 CDS 74359 74634 . + 2 Parent=POPTR_0001s00220.1;PACid=18237390
scaffold_1 phytozome6 CDS 75259 75893 . + 2 Parent=POPTR_0001s00220.1;PACid=18237390
scaffold_1 phytozome6 gene 80191 81289 . - . ID=POPTR_0001s00230;Name=POPTR_0001s00230
scaffold_1 phytozome6 mRNA 80191 81289 . - . ID=POPTR_0001s00230.1;Name=POPTR_0001s00230.1;PACid=18235601;Parent=POPTR_0001s00230
scaffold_1 phytozome6 CDS 81161 81289 . - 0 Parent=POPTR_0001s00230.1;PACid=18235601
scaffold_1 phytozome6 CDS 80191 80385 . - 0 Parent=POPTR_0001s00230.1;PACid=18235601
#
In a multi-line display (more readable)
while IFS='=;' read a b c d e
do if [[ "$b" != "$d" ]]
then [[ "$f" == 1 ]] && v=$d; f=0; b=$v
else f=1
fi
[[ ! -z $e ]] && d="$d;$e"
echo "$a=$b;$c=$d"
done <tst
1 Like
shen
February 22, 2011, 2:34pm
19
Thank you it works perfectly,your code is faster.I tried with following code earlier
awk '/mRNA/{split($(NF-2),a,"=");print;next}{sub(/t=.*/,"t="a[2],$1)}1 ' FS=\; OFS=\; input > tmp1
awk '{if(substr($3,1,4)=="mRNA")split ($9, a,";");{sub(a[1],a[2])} print ;}' tmp1>tmp2
awk '{if(substr($9,1,4)=="Name"){sub(substr($9,1,4),"ID")} print;}' tmp2> output
rdcwayx
February 22, 2011, 6:40pm
20
All in one.
awk '/mRNA/{split($2,a,"=");print;next}/gene/{print;next}{sub(/=.*/,"="a[2],$1)}1 ' FS=\; OFS=\; infile
1 Like