shen
1
I have file which contains gene lines something like this
Transcript Name GO
POPTR_0016s06290.1 98654
POPTR_2158s00200.1 11324
POPTR_0004s22390.1 12897
POPTR_0001s11490.1
POPTR_0016s13950.1 14532
POPTR_0015s05840.1 13455
POPTR_0013s06470.1 12344
POPTR_0013s06470.1 13248
POPTR_0013s06470.1 14565
POPTR_0013s06470.1 16817
I want following output
Transcript Name GO
POPTR_0016s06290.1 98654
POPTR_2158s00200.1 11324
POPTR_0004s22390.1 12897
POPTR_0001s11490.1
POPTR_0016s13950.1 14532
POPTR_0015s05840.1 13455
POPTR_0013s06470.1 12344||13248||13248||14565||16817
please help me to get this output
nawk 'FNR==1{h=$0;next} {a[$1]=($1 in a)?a[$1] "||" $2:$2}END{print h;for(i in a) print i,a}' OFS='\t' myFile
1 Like
Try:
awk 'NR==1{h=$0;next}{a[$1]=a[$1]"||"$2}END{print h;for (i in a){sub("^\\|\\|","",a);print i,a}}' file
1 Like
shen
4
thank you for your help,but I got this out put output
Transcript Name GO
POPTR_0016s06290.1 ||98654
POPTR_2158s00200.1 ||11324
POPTR_0004s22390.1 ||12897
POPTR_0001s11490.1 ||
POPTR_0016s13950.1 ||14532
POPTR_0015s05840.1 ||13455
POPTR_0013s06470.1 ||12344||13248||13248||14565||16817
I need this output
Transcript Name GO
POPTR_0016s06290.1 98654
POPTR_2158s00200.1 11324
POPTR_0004s22390.1 12897
POPTR_0001s11490.1
POPTR_0016s13950.1 14532
POPTR_0015s05840.1 13455
POPTR_0013s06470.1 12344||13248||13248||14565||16817
Strange. given your sample input, I got:
Transcript Name GO
POPTR_0004s22390.1 12897
POPTR_0015s05840.1 13455
POPTR_0001s11490.1
POPTR_0016s06290.1 98654
POPTR_0013s06470.1 12344||13248||14565||16817
POPTR_2158s00200.1 11324
POPTR_0016s13950.1 14532
1 Like
shen
6
Perfect thank you.It works fine
---------- Post updated at 02:33 PM ---------- Previous update was at 02:30 PM ----------
strange I checked it again here is the result
~/newdata/phytozome_perlfiles# nawk 'FNR==1{h=$0;next} {a[$1]=($1 in a)?a[$1] "||" $2:$2}END{print h;for(i in a) print i,a}' OFS='\t' tmp
Transcript Name GO
POPTR_0013s06470.1 ||12344||13248||14565||16817
POPTR_0016s06290.1 ||98654
POPTR_0004s22390.1 ||12897
POPTR_2158s00200.1 ||11324
POPTR_0001s11490.1 ||
POPTR_0015s05840.1 ||13455
POPTR_0016s13950.1 ||14532
Try this awk script...
nawk '{if(NR>1){r[$1]=r[$1]?r[$1]"||"$2:$2}else s=$0}END{print s;for(i in r)print i"\t"r}' file