Awk script

I have following text

scaffold_1      phytozome6      gene    12632   13612   .       +       .       ID=PT_0001s00200;Name=PT_0001s00200
scaffold_1      phytozome6      mRNA    12632   13612   .     +       .     ID=PAC:18235173;Name=PT_0001s00200.1;PACid=18235173;Parent=PT_0001s00200
scaffold_1      phytozome6      5'-UTR  12632   12638   .       +       .       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     12639   12650   .       +       0      Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     12768   12891   .       +       0      Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     13117   13226   .       +       2      Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     13310   13384   .       +       0      Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      3'-UTR  13385   13612   .       +       .       Parent=PAC:18235173;PACid=18235173

I want convert into

scaffold_1      phytozome6      gene    12632   13612   .       +       .       ID=PT_0001s00200;Name=PT_0001s00200
scaffold_1      phytozome6      mRNA    12632   13612   .     +       .     ID=PT_0001s00200.1;Name=PT_0001s00200.1;PACid=18235173;Parent=PT_0001s00200
scaffold_1      phytozome6      5'-UTR  12632   12638   .       +       .       Parent=PT_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      CDS     12639   12650   .       +       0      Parent=PT_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      CDS     12768   12891   .       +       0      Parent=PT_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      CDS     13117   13226   .       +       2      Parent=PT_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      CDS     13310   13384   .       +       0      Parent=PT_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      3'-UTR  13385   13612   .       +       .       Parent=PT_0001s00200.1;PACid=18235173

I tried with following script

awk '{if(substr($9,17,10)=="Name=PT") print gensub("Parent=PAC:"substr($9,47,9),"Parent="substr($9,22,16)".1",
  substr($0,0)) substr($0,0) }' inputfile > outputfile

but still no luck.

Can you please help me,

why not use sed?

sed 's/Parent=PAC:18235173;PACid=18235173/Parent=PT_0001s00200.1;PACid=18235173/g'

You can do an inline edit to the same file using the -i switch. Or redirect it to another file like your example shows

I have used following script but still I cant replace the Parent=PAC

awk '{if(substr($9,17,10)=="Name=PT") n=substr($9,22,16) gsub("Parent=PAC:"substr($9,47,9),"Parent="n".1");
  gsub("ID=PAC:"substr($9,8,19),"ID="n".1");  }' input>output

any help appreciated

---------- Post updated at 04:50 AM ---------- Previous update was at 04:41 AM ----------

Thank you for your reply,but this is only one record there are set records with unique PAC and PT ids ,In this case I have to use your solution 300000 times with changing PAC and PT ids,cant we use

if(substr($9,17,10)=="Name=PT")

with your solution.

awk -F"[=;]" 'NR==1{s=$2 ".1";print;FS=OFS=";";next}{sub(/=.*/,"="s,$1)}1' infile

Thank you for your reply but I get following

scaffold_1      phytozome6      gene    2330052 2335284 .       -       .       ID=.1;Name=PT_0001s02940
scaffold_1      phytozome6      mRNA    2330052 2335284 .       -       .       ID=.1;Name=PT_0001s02940.1;PACid=18235154;Parent=PT_0001s02940
scaffold_1      phytozome6      CDS     2334981 2335230 .       -       0       Parent=.1;PACid=18235154
scaffold_1      phytozome6      5'-UTR  2335231 2335284 .       -       .       Parent=.1;PACid=18235154
scaffold_1      phytozome6      CDS     2334079 2334206 .       -       2       Parent=.1;PACid=18235154
scaffold_1      phytozome6      CDS     2333907 2333978 .       -       0       Parent=.1;PACid=18235154
scaffold_1      phytozome6      CDS     2333635 2333780 .       -       0       Parent=.1;PACid=18235154
scaffold_1      phytozome6      CDS     2333448 2333562 .       -       1       Parent=.1;PACid=18235154
scaffold_1      phytozome6      CDS     2333285 2333365 .       -       0       Parent=.1;PACid=18235154
scaffold_1      phytozome6      CDS     2332541 2332678 .       -       0       Parent=.1;PACid=18235154
scaffold_1      phytozome6      CDS     2331826 2331913 .       -       0       Parent=.1;PACid=18235154
scaffold_1      phytozome6      CDS     2330651 2330764 .       -       2       Parent=.1;PACid=18235154
scaffold_1      phytozome6      3'-UTR  2330052 2330460 .       -       .       Parent=.1;PACid=18235154
scaffold_1      phytozome6      CDS     2330461 2330483 .       -       2       Parent=.1;PACid=18235154

still no luck,
I used following command it will give the correct output,Did I do correct?

awk '{if(substr($9,1,5)=="ID=PT_") n=substr($9,4,16)".1"; gsub("ID=PAC:"substr($9,8,8),"ID="n);
  gsub("Parent=PAC:"substr($9,12,8),"Parent="n); print; }' infile > outfile

is your document's first line different as the sample? I get the ID number from the first line.

The first line should be:

scaffold_1      phytozome6      gene    12632   13612   .       +       .       ID=PT_0001s00200;Name=PT_0001s00200

yes It is different

scaffold_1      phytozome6      gene    12632   13612   .       +       .       ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1      phytozome6      mRNA    12632   13612   .     +       .     ID=PAC:18235173;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1      phytozome6      5'-UTR  12632   12638   .       +       .       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     12639   12650   .       +       0      Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     12768   12891   .       +       0      Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     13117   13226   .       +       2      Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     13310   13384   .       +       0      Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      3'-UTR  13385   13612   .       +       .       Parent=PAC:18235173;PACid=18235173

change it to

caffold_1      phytozome6      gene    12632   13612   .       +       .       ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1      phytozome6      mRNA    12632   13612   .     +       .     ID=POPTR_0001s00200.1;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1      phytozome6      5'-UTR  12632   12638   .       +       .       Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      CDS     12639   12650   .       +       0      Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      CDS     12768   12891   .       +       0      Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      CDS     13117   13226   .       +       2      Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      CDS     13310   13384   .       +       0      Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      3'-UTR  13385   13612   .       +       .       Parent=POPTR_0001s00200.1;PACid=18235173

Do you think your function will help.sorry for changing format

No change, Still fine.

$ cat infile
scaffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=PAC:18235173;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=PAC:18235173;PACid=18235173

$ awk -F"[=;]" 'NR==1{s=$2 ".1";print;FS=";";OFS=";";next}{sub(/=.*/,"="s,$1)}1' infile
scaffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=POPTR_0001s00200.1;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=POPTR_0001s00200.1;PACid=18235173

Is your awk support -F"[=;]"

Yes It supports FS,Thank you for your help but It I want unique sub sets depend on the POPTR_ for example,those are gene names

scaffold_997    phytozome6      gene    1687    2351    .       -       .       ID=POPTR_0997s00200;Name=POPTR_0997s00200
scaffold_997    phytozome6      mRNA    1687    2351    .       -       .       ID=PAC:18226942;Name=POPTR_0997s00200.1;PACid=18226942;Parent=POPTR_0997s00200
scaffold_997    phytozome6      CDS     2240    2317    .       -       0       Parent=PAC:18226942;PACid=18226942
scaffold_997    phytozome6      5'-UTR  2318    2351    .       -       .       Parent=PAC:18226942;PACid=18226942
scaffold_997    phytozome6      CDS     2078    2111    .       -       0       Parent=PAC:18226942;PACid=18226942
scaffold_997    phytozome6      3'-UTR  1687    1866    .       -       .       Parent=PAC:18226942;PACid=18226942
scaffold_997    phytozome6      CDS     1867    1997    .       -       2       Parent=PAC:18226942;PACid=18226942

changed to

scaffold_997    phytozome6      gene    1687    2351    .       -       .       ID=POPTR_0997s00200;Name=POPTR_0997s00200
scaffold_997    phytozome6      mRNA    1687    2351    .       -       .       ID=POPTR_0997s00200.1;Name=POPTR_0997s00200.1;PACid=18226942;Parent=POPTR_0997s00200
scaffold_997    phytozome6      CDS     2240    2317    .       -       0       Parent=POPTR_0997s00200.1;PACid=18226942
scaffold_997    phytozome6      5'-UTR  2318    2351    .       -       .       Parent=POPTR_0997s00200.1;PACid=18226942
scaffold_997    phytozome6      CDS     2078    2111    .       -       0       Parent=POPTR_0997s00200.1;PACid=18226942
scaffold_997    phytozome6      3'-UTR  1687    1866    .       -       .       Parent=POPTR_0997s00200.1;PACid=18226942
scaffold_997    phytozome6      CDS     1867    1997    .       -       2       Parent=POPTR_0997s00200.1;PACid=18226942

Do you think you can adjust your function? I really like your function,instead of following one

awk '{if(substr($9,1,9)=="ID=POPTR_") n=substr($9,4,16)".1"; gsub("ID=PAC:"substr($9,8,8),"ID="n);
  gsub("Parent=PAC:"substr($9,12,8),"Parent="n); print; }' input>output

I don't care what your code is doing, I write the code to export the output you want.

I try your latest input and output with my command, still perfect, I don't see any issue.

So you need point out the error what I need adjust.

Here is the expected input and output,but your code not generate following output

sscaffold_1	phytozome6	gene	12632	13612	.	+	.	ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1	phytozome6	mRNA	12632	13612	.	+	.	ID=PAC:18235173;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1	phytozome6	5'-UTR	12632	12638	.	+	.	Parent=PAC:18235173;PACid=18235173
scaffold_1	phytozome6	CDS	12639	12650	.	+	0	Parent=PAC:18235173;PACid=18235173
scaffold_1	phytozome6	CDS	12768	12891	.	+	0	Parent=PAC:18235173;PACid=18235173
scaffold_1	phytozome6	CDS	13117	13226	.	+	2	Parent=PAC:18235173;PACid=18235173
scaffold_1	phytozome6	CDS	13310	13384	.	+	0	Parent=PAC:18235173;PACid=18235173
scaffold_1	phytozome6	3'-UTR	13385	13612	.	+	.	Parent=PAC:18235173;PACid=18235173
scaffold_1	phytozome6	gene	19769	22804	.	+	.	ID=POPTR_0001s00210;Name=POPTR_0001s00210
scaffold_1	phytozome6	mRNA	19769	22804	.	+	.	ID=PAC:18238552;Name=POPTR_0001s00210.1;PACid=18238552;Parent=POPTR_0001s00210
scaffold_1	phytozome6	5'-UTR	19769	19827	.	+	.	Parent=PAC:18238552;PACid=18238552
scaffold_1	phytozome6	CDS	19828	20136	.	+	0	Parent=PAC:18238552;PACid=18238552
scaffold_1	phytozome6	CDS	22190	22516	.	+	0	Parent=PAC:18238552;PACid=18238552
scaffold_1	phytozome6	3'-UTR	22517	22804	.	+	.	Parent=PAC:18238552;PACid=18238552
scaffold_1	phytozome6	gene	74076	75893	.	+	.	ID=POPTR_0001s00220;Name=POPTR_0001s00220
scaffold_1	phytozome6	mRNA	74076	75893	.	+	.	ID=PAC:18237390;Name=POPTR_0001s00220.1;PACid=18237390;Parent=POPTR_0001s00220
scaffold_1	phytozome6	CDS	74076	74235	.	+	0	Parent=PAC:18237390;PACid=18237390
scaffold_1	phytozome6	CDS	74359	74634	.	+	2	Parent=PAC:18237390;PACid=18237390
scaffold_1	phytozome6	CDS	75259	75893	.	+	2	Parent=PAC:18237390;PACid=18237390
scaffold_1	phytozome6	gene	80191	81289	.	-	.	ID=POPTR_0001s00230;Name=POPTR_0001s00230
scaffold_1	phytozome6	mRNA	80191	81289	.	-	.	ID=PAC:18235601;Name=POPTR_0001s00230.1;PACid=18235601;Parent=POPTR_0001s00230
scaffold_1	phytozome6	CDS	81161	81289	.	-	0	Parent=PAC:18235601;PACid=18235601
scaffold_1	phytozome6	CDS	80191	80385	.	-	0	Parent=PAC:18235601;PACid=18235601

desired output

scaffold_1	phytozome6	gene	12632	13612	.	+	.	ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1	phytozome6	mRNA	12632	13612	.	+	.	ID=POPTR_0001s00200.1;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1	phytozome6	5'-UTR	12632	12638	.	+	.	Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1	phytozome6	CDS	12639	12650	.	+	0	Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1	phytozome6	CDS	12768	12891	.	+	0	Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1	phytozome6	CDS	13117	13226	.	+	2	Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1	phytozome6	CDS	13310	13384	.	+	0	Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1	phytozome6	3'-UTR	13385	13612	.	+	.	Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1	phytozome6	gene	19769	22804	.	+	.	ID=POPTR_0001s00210;Name=POPTR_0001s00210
scaffold_1	phytozome6	mRNA	19769	22804	.	+	.	ID=POPTR_0001s00210.1;Name=POPTR_0001s00210.1;PACid=18238552;Parent=POPTR_0001s00210
scaffold_1	phytozome6	5'-UTR	19769	19827	.	+	.	Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1	phytozome6	CDS	19828	20136	.	+	0	Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1	phytozome6	CDS	22190	22516	.	+	0	Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1	phytozome6	3'-UTR	22517	22804	.	+	.	Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1	phytozome6	gene	74076	75893	.	+	.	ID=POPTR_0001s00220;Name=POPTR_0001s00220
scaffold_1	phytozome6	mRNA	74076	75893	.	+	.	ID=POPTR_0001s00220.1;Name=POPTR_0001s00220.1;PACid=18237390;Parent=POPTR_0001s00220
scaffold_1	phytozome6	CDS	74076	74235	.	+	0	Parent=POPTR_0001s00220.1;PACid=18237390
scaffold_1	phytozome6	CDS	74359	74634	.	+	2	Parent=POPTR_0001s00220.1;PACid=18237390
scaffold_1	phytozome6	CDS	75259	75893	.	+	2	Parent=POPTR_0001s00220.1;PACid=18237390
scaffold_1	phytozome6	gene	80191	81289	.	-	.	ID=POPTR_0001s00230;Name=POPTR_0001s00230
scaffold_1	phytozome6	mRNA	80191	81289	.	-	.	ID=POPTR_0001s00230.1;Name=POPTR_0001s00230.1;PACid=18235601;Parent=POPTR_0001s00230
scaffold_1	phytozome6	CDS	81161	81289	.	-	0	Parent=POPTR_0001s00230.1;PACid=18235601
scaffold_1	phytozome6	CDS	80191	80385	.	-	0	Parent=POPTR_0001s00230.1;PACid=18235601

I hope you can clearly see the issue now.Thank you

awk -F \; '/gene/{split($(NF-1),a,"=");print;next}{sub(/=.*/,"="a[2]".1",$1)}1 ' infile
awk -F"=|;" '/Name=/{sub($2,$4);v=$2}{$2=v}1' file

Thanks rdcwayx you are really talented,I just want to remove spaces between ; because your output gave following

Parent=POPTR_0001s00230.1; PACid=18235601

---------- Post updated at 04:04 AM ---------- Previous update was at 04:01 AM ----------

Thank you yinyuemi your out put missing ;

sscaffold_1     phytozome6      gene    12632   13612   .       +       .       ID POPTR_0001s00200 Name POPTR_0001s00200
scaffold_1      phytozome6      mRNA    12632   13612   .       +       .       ID POPTR_0001s00200.1 Name POPTR_0001s00200.1 PACid 18235173 Parent POPTR_0001s00200
scaffold_1      phytozome6      5'-UTR  12632   12638   .       +       .       Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1      phytozome6      CDS     12639   12650   .       +       0       Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1      phytozome6      CDS     12768   12891   .       +       0       Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1      phytozome6      CDS     13117   13226   .       +       2       Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1      phytozome6      CDS     13310   13384   .       +       0       Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1      phytozome6      3'-UTR  13385   13612   .       +       .       Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1      phytozome6      gene    19769   22804   .       +       .       ID POPTR_0001s00210 Name POPTR_0001s00210
scaffold_1      phytozome6      mRNA    19769   22804   .       +       .       ID POPTR_0001s00210.1 Name POPTR_0001s00210.1 PACid 18238552 Parent POPTR_0001s00210
scaffold_1      phytozome6      5'-UTR  19769   19827   .       +       .       Parent POPTR_0001s00210.1 PACid 18238552
scaffold_1      phytozome6      CDS     19828   20136   .       +       0       Parent POPTR_0001s00210.1 PACid 18238552
scaffold_1      phytozome6      CDS     22190   22516   .       +       0       Parent POPTR_0001s00210.1 PACid 18238552
scaffold_1      phytozome6      3'-UTR  22517   22804   .       +       .       Parent POPTR_0001s00210.1 PACid 18238552
scaffold_1      phytozome6      gene    74076   75893   .       +       .       ID POPTR_0001s00220 Name POPTR_0001s00220
scaffold_1      phytozome6      mRNA    74076   75893   .       +       .       ID POPTR_0001s00220.1 Name POPTR_0001s00220.1 PACid 18237390 Parent POPTR_0001s00220
scaffold_1      phytozome6      CDS     74076   74235   .       +       0       Parent POPTR_0001s00220.1 PACid 18237390
scaffold_1      phytozome6      CDS     74359   74634   .       +       2       Parent POPTR_0001s00220.1 PACid 18237390
scaffold_1      phytozome6      CDS     75259   75893   .       +       2       Parent POPTR_0001s00220.1 PACid 18237390
scaffold_1      phytozome6      gene    80191   81289   .       -       .       ID POPTR_0001s00230 Name POPTR_0001s00230
scaffold_1      phytozome6      mRNA    80191   81289   .       -       .       ID POPTR_0001s00230.1 Name POPTR_0001s00230.1 PACid 18235601 Parent POPTR_0001s00230
scaffold_1      phytozome6      CDS     81161   81289   .       -       0       Parent POPTR_0001s00230.1 PACid 18235601
scaffold_1      phytozome6      CDS     80191   80385   .       -       0       Parent POPTR_0001s00230.1 PACid 18235601

awk '/gene/{split($(NF-1),a,"=");print;next}{sub(/=.*/,"="a[2]".1",$1)}1 ' FS=\; OFS=\; infile
1 Like
while IFS='=;' read a b c d e
do [[ "$b" == "$d" ]] && v=$b || b=$v
[[ ! -z $e ]] && d="$d;$e"
echo "$a=$b;$c=$d"
done <infile
# cat tst
sscaffold_1     phytozome6      gene    12632   13612   .       +       .       ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1      phytozome6      mRNA    12632   13612   .       +       .       ID=PAC:18235173;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1      phytozome6      5'-UTR  12632   12638   .       +       .       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     12639   12650   .       +       0       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     12768   12891   .       +       0       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     13117   13226   .       +       2       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     13310   13384   .       +       0       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      3'-UTR  13385   13612   .       +       .       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      gene    19769   22804   .       +       .       ID=POPTR_0001s00210;Name=POPTR_0001s00210
scaffold_1      phytozome6      mRNA    19769   22804   .       +       .       ID=PAC:18238552;Name=POPTR_0001s00210.1;PACid=18238552;Parent=POPTR_0001s00210
scaffold_1      phytozome6      5'-UTR  19769   19827   .       +       .       Parent=PAC:18238552;PACid=18238552
scaffold_1      phytozome6      CDS     19828   20136   .       +       0       Parent=PAC:18238552;PACid=18238552
scaffold_1      phytozome6      CDS     22190   22516   .       +       0       Parent=PAC:18238552;PACid=18238552
scaffold_1      phytozome6      3'-UTR  22517   22804   .       +       .       Parent=PAC:18238552;PACid=18238552
scaffold_1      phytozome6      gene    74076   75893   .       +       .       ID=POPTR_0001s00220;Name=POPTR_0001s00220
scaffold_1      phytozome6      mRNA    74076   75893   .       +       .       ID=PAC:18237390;Name=POPTR_0001s00220.1;PACid=18237390;Parent=POPTR_0001s00220
scaffold_1      phytozome6      CDS     74076   74235   .       +       0       Parent=PAC:18237390;PACid=18237390
scaffold_1      phytozome6      CDS     74359   74634   .       +       2       Parent=PAC:18237390;PACid=18237390
scaffold_1      phytozome6      CDS     75259   75893   .       +       2       Parent=PAC:18237390;PACid=18237390
scaffold_1      phytozome6      gene    80191   81289   .       -       .       ID=POPTR_0001s00230;Name=POPTR_0001s00230
scaffold_1      phytozome6      mRNA    80191   81289   .       -       .       ID=PAC:18235601;Name=POPTR_0001s00230.1;PACid=18235601;Parent=POPTR_0001s00230
scaffold_1      phytozome6      CDS     81161   81289   .       -       0       Parent=PAC:18235601;PACid=18235601
scaffold_1      phytozome6      CDS     80191   80385   .       -       0       Parent=PAC:18235601;PACid=18235601
# while IFS='=;' read a b c d e; do [[ "$b" == "$d" ]] && v=$b || b=$v ; [[ ! -z $e ]] && d="$d;$e" ; echo "$a=$b;$c=$d" ; done <tst
sscaffold_1     phytozome6      gene    12632   13612   .       +       .       ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1      phytozome6      mRNA    12632   13612   .       +       .       ID=POPTR_0001s00200;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1      phytozome6      5'-UTR  12632   12638   .       +       .       Parent=POPTR_0001s00200;PACid=18235173
scaffold_1      phytozome6      CDS     12639   12650   .       +       0       Parent=POPTR_0001s00200;PACid=18235173
scaffold_1      phytozome6      CDS     12768   12891   .       +       0       Parent=POPTR_0001s00200;PACid=18235173
scaffold_1      phytozome6      CDS     13117   13226   .       +       2       Parent=POPTR_0001s00200;PACid=18235173
scaffold_1      phytozome6      CDS     13310   13384   .       +       0       Parent=POPTR_0001s00200;PACid=18235173
scaffold_1      phytozome6      3'-UTR  13385   13612   .       +       .       Parent=POPTR_0001s00200;PACid=18235173
scaffold_1      phytozome6      gene    19769   22804   .       +       .       ID=POPTR_0001s00210;Name=POPTR_0001s00210
scaffold_1      phytozome6      mRNA    19769   22804   .       +       .       ID=POPTR_0001s00210;Name=POPTR_0001s00210.1;PACid=18238552;Parent=POPTR_0001s00210
scaffold_1      phytozome6      5'-UTR  19769   19827   .       +       .       Parent=POPTR_0001s00210;PACid=18238552
scaffold_1      phytozome6      CDS     19828   20136   .       +       0       Parent=POPTR_0001s00210;PACid=18238552
scaffold_1      phytozome6      CDS     22190   22516   .       +       0       Parent=POPTR_0001s00210;PACid=18238552
scaffold_1      phytozome6      3'-UTR  22517   22804   .       +       .       Parent=POPTR_0001s00210;PACid=18238552
scaffold_1      phytozome6      gene    74076   75893   .       +       .       ID=POPTR_0001s00220;Name=POPTR_0001s00220
scaffold_1      phytozome6      mRNA    74076   75893   .       +       .       ID=POPTR_0001s00220;Name=POPTR_0001s00220.1;PACid=18237390;Parent=POPTR_0001s00220
scaffold_1      phytozome6      CDS     74076   74235   .       +       0       Parent=POPTR_0001s00220;PACid=18237390
scaffold_1      phytozome6      CDS     74359   74634   .       +       2       Parent=POPTR_0001s00220;PACid=18237390
scaffold_1      phytozome6      CDS     75259   75893   .       +       2       Parent=POPTR_0001s00220;PACid=18237390
scaffold_1      phytozome6      gene    80191   81289   .       -       .       ID=POPTR_0001s00230;Name=POPTR_0001s00230
scaffold_1      phytozome6      mRNA    80191   81289   .       -       .       ID=POPTR_0001s00230;Name=POPTR_0001s00230.1;PACid=18235601;Parent=POPTR_0001s00230
scaffold_1      phytozome6      CDS     81161   81289   .       -       0       Parent=POPTR_0001s00230;PACid=18235601
scaffold_1      phytozome6      CDS     80191   80385   .       -       0       Parent=POPTR_0001s00230;PACid=18235601
#
1 Like

your out put missing with .1 I need to replace POPTR_0001s00200.1,which is name= value with PAC ids.

---------- Post updated at 08:26 AM ---------- Previous update was at 08:22 AM ----------

Hi rdcwayx unfortunately I got special case I want to

sscaffold_1     phytozome6      gene    12632   13612   .       +       .       ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1      phytozome6      mRNA    12632   13612   .       +       .       ID=PAC:18235173;Name=POPTR_0001s00200.2;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1      phytozome6      5'-UTR  12632   12638   .       +       .       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     12639   12650   .       +       0       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     12768   12891   .       +       0       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     13117   13226   .       +       2       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     13310   13384   .       +       0       Parent=PAC:18235173;PACid=18235173

so I want to get out put

sscaffold_1     phytozome6      gene    12632   13612   .       +       .       ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1      phytozome6      mRNA    12632   13612   .       +       .       ID=POPTR_0001s00200.2;Name=POPTR_0001s00200.2;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1      phytozome6      5'-UTR  12632   12638   .       +       .       Parent=POPTR_0001s00200.2;PACid=18235173
scaffold_1      phytozome6      CDS     12639   12650   .       +       0       Parent=POPTR_0001s00200.2;PACid=18235173
scaffold_1      phytozome6      CDS     12768   12891   .       +       0       Parent=POPTR_0001s00200.2;PACid=18235173
scaffold_1      phytozome6      CDS     13117   13226   .       +       2       Parent=POPTR_0001s00200.2;PACid=18235173
scaffold_1      phytozome6      CDS     13310   13384   .       +       0       Parent=POPTR_0001s00200.2;PACid=18235173
scaffold_1      phytozome6      3'-UTR  13385   13612   .       +       .       Parent=POPTR_0001s00200.2;PACid=18235173

Can I get the

value nad replace with

and

this POPTR_0001s00200.2 go up to POPTR_0001s00200.12

while IFS='=;' read a b c d e; do if [[ "$b" != "$d" ]]; then [[ "$f" ==  1 ]] && v=$d; f=0; b=$v; else f=1; fi; [[ ! -z $e ]] &&  d="$d;$e"; echo "$a=$b;$c=$d"; done <tst
# cat tst
sscaffold_1     phytozome6      gene    12632   13612   .       +       .       ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1      phytozome6      mRNA    12632   13612   .       +       .       ID=PAC:18235173;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1      phytozome6      5'-UTR  12632   12638   .       +       .       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     12639   12650   .       +       0       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     12768   12891   .       +       0       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     13117   13226   .       +       2       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      CDS     13310   13384   .       +       0       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      3'-UTR  13385   13612   .       +       .       Parent=PAC:18235173;PACid=18235173
scaffold_1      phytozome6      gene    19769   22804   .       +       .       ID=POPTR_0001s00210;Name=POPTR_0001s00210
scaffold_1      phytozome6      mRNA    19769   22804   .       +       .       ID=PAC:18238552;Name=POPTR_0001s00210.1;PACid=18238552;Parent=POPTR_0001s00210
scaffold_1      phytozome6      5'-UTR  19769   19827   .       +       .       Parent=PAC:18238552;PACid=18238552
scaffold_1      phytozome6      CDS     19828   20136   .       +       0       Parent=PAC:18238552;PACid=18238552
scaffold_1      phytozome6      CDS     22190   22516   .       +       0       Parent=PAC:18238552;PACid=18238552
scaffold_1      phytozome6      3'-UTR  22517   22804   .       +       .       Parent=PAC:18238552;PACid=18238552
scaffold_1      phytozome6      gene    74076   75893   .       +       .       ID=POPTR_0001s00220;Name=POPTR_0001s00220
scaffold_1      phytozome6      mRNA    74076   75893   .       +       .       ID=PAC:18237390;Name=POPTR_0001s00220.1;PACid=18237390;Parent=POPTR_0001s00220
scaffold_1      phytozome6      CDS     74076   74235   .       +       0       Parent=PAC:18237390;PACid=18237390
scaffold_1      phytozome6      CDS     74359   74634   .       +       2       Parent=PAC:18237390;PACid=18237390
scaffold_1      phytozome6      CDS     75259   75893   .       +       2       Parent=PAC:18237390;PACid=18237390
scaffold_1      phytozome6      gene    80191   81289   .       -       .       ID=POPTR_0001s00230;Name=POPTR_0001s00230
scaffold_1      phytozome6      mRNA    80191   81289   .       -       .       ID=PAC:18235601;Name=POPTR_0001s00230.1;PACid=18235601;Parent=POPTR_0001s00230
scaffold_1      phytozome6      CDS     81161   81289   .       -       0       Parent=PAC:18235601;PACid=18235601
scaffold_1      phytozome6      CDS     80191   80385   .       -       0       Parent=PAC:18235601;PACid=18235601
#
# while IFS='=;' read a b c d e; do if [[ "$b" != "$d" ]]; then [[ "$f" == 1 ]] && v=$d; f=0; b=$v; else f=1; fi; [[ ! -z $e ]] && d="$d;$e"; echo "$a=$b;$c=$d"; done <tst
sscaffold_1     phytozome6      gene    12632   13612   .       +       .       ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1      phytozome6      mRNA    12632   13612   .       +       .       ID=POPTR_0001s00200.1;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1      phytozome6      5'-UTR  12632   12638   .       +       .       Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      CDS     12639   12650   .       +       0       Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      CDS     12768   12891   .       +       0       Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      CDS     13117   13226   .       +       2       Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      CDS     13310   13384   .       +       0       Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      3'-UTR  13385   13612   .       +       .       Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1      phytozome6      gene    19769   22804   .       +       .       ID=POPTR_0001s00210;Name=POPTR_0001s00210
scaffold_1      phytozome6      mRNA    19769   22804   .       +       .       ID=POPTR_0001s00210.1;Name=POPTR_0001s00210.1;PACid=18238552;Parent=POPTR_0001s00210
scaffold_1      phytozome6      5'-UTR  19769   19827   .       +       .       Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1      phytozome6      CDS     19828   20136   .       +       0       Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1      phytozome6      CDS     22190   22516   .       +       0       Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1      phytozome6      3'-UTR  22517   22804   .       +       .       Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1      phytozome6      gene    74076   75893   .       +       .       ID=POPTR_0001s00220;Name=POPTR_0001s00220
scaffold_1      phytozome6      mRNA    74076   75893   .       +       .       ID=POPTR_0001s00220.1;Name=POPTR_0001s00220.1;PACid=18237390;Parent=POPTR_0001s00220
scaffold_1      phytozome6      CDS     74076   74235   .       +       0       Parent=POPTR_0001s00220.1;PACid=18237390
scaffold_1      phytozome6      CDS     74359   74634   .       +       2       Parent=POPTR_0001s00220.1;PACid=18237390
scaffold_1      phytozome6      CDS     75259   75893   .       +       2       Parent=POPTR_0001s00220.1;PACid=18237390
scaffold_1      phytozome6      gene    80191   81289   .       -       .       ID=POPTR_0001s00230;Name=POPTR_0001s00230
scaffold_1      phytozome6      mRNA    80191   81289   .       -       .       ID=POPTR_0001s00230.1;Name=POPTR_0001s00230.1;PACid=18235601;Parent=POPTR_0001s00230
scaffold_1      phytozome6      CDS     81161   81289   .       -       0       Parent=POPTR_0001s00230.1;PACid=18235601
scaffold_1      phytozome6      CDS     80191   80385   .       -       0       Parent=POPTR_0001s00230.1;PACid=18235601
#

In a multi-line display (more readable)

while IFS='=;' read a b c d e
do if [[ "$b" != "$d" ]]
        then [[ "$f" == 1 ]] && v=$d; f=0; b=$v
        else f=1
fi
[[ ! -z $e ]] && d="$d;$e"
echo "$a=$b;$c=$d"
done <tst
1 Like

Thank you it works perfectly,your code is faster.I tried with following code earlier

awk '/mRNA/{split($(NF-2),a,"=");print;next}{sub(/t=.*/,"t="a[2],$1)}1 ' FS=\; OFS=\; input > tmp1
awk '{if(substr($3,1,4)=="mRNA")split ($9, a,";");{sub(a[1],a[2])} print ;}'  tmp1>tmp2
awk '{if(substr($9,1,4)=="Name"){sub(substr($9,1,4),"ID")} print;}' tmp2> output

All in one.

awk '/mRNA/{split($2,a,"=");print;next}/gene/{print;next}{sub(/=.*/,"="a[2],$1)}1 ' FS=\; OFS=\; infile
1 Like