Using AWK to format output based on key field

I have file which contains gene lines something like this

Transcript Name          GO
POPTR_0016s06290.1      98654
POPTR_2158s00200.1      11324
POPTR_0004s22390.1      12897
POPTR_0001s11490.1
POPTR_0016s13950.1      14532
POPTR_0015s05840.1      13455
POPTR_0013s06470.1      12344
POPTR_0013s06470.1      13248
POPTR_0013s06470.1      14565
POPTR_0013s06470.1      16817

I want following output

Transcript Name          GO
POPTR_0016s06290.1      98654
POPTR_2158s00200.1      11324
POPTR_0004s22390.1      12897
POPTR_0001s11490.1
POPTR_0016s13950.1      14532
POPTR_0015s05840.1      13455
POPTR_0013s06470.1      12344||13248||13248||14565||16817

please help me to get this output

nawk 'FNR==1{h=$0;next} {a[$1]=($1 in a)?a[$1] "||" $2:$2}END{print h;for(i in a) print i,a}' OFS='\t' myFile
1 Like

Try:

awk 'NR==1{h=$0;next}{a[$1]=a[$1]"||"$2}END{print h;for (i in a){sub("^\\|\\|","",a);print i,a}}' file
1 Like

thank you for your help,but I got this out put output

Transcript Name          GO
POPTR_0016s06290.1      ||98654
POPTR_2158s00200.1      ||11324
POPTR_0004s22390.1      ||12897
POPTR_0001s11490.1      || 
POPTR_0016s13950.1      ||14532
POPTR_0015s05840.1      ||13455
POPTR_0013s06470.1      ||12344||13248||13248||14565||16817

I need this output

Transcript Name          GO
POPTR_0016s06290.1      98654
POPTR_2158s00200.1      11324
POPTR_0004s22390.1      12897
POPTR_0001s11490.1       
POPTR_0016s13950.1      14532
POPTR_0015s05840.1      13455
POPTR_0013s06470.1      12344||13248||13248||14565||16817

Strange. given your sample input, I got:

Transcript Name          GO
POPTR_0004s22390.1      12897
POPTR_0015s05840.1      13455
POPTR_0001s11490.1
POPTR_0016s06290.1      98654
POPTR_0013s06470.1      12344||13248||14565||16817
POPTR_2158s00200.1      11324
POPTR_0016s13950.1      14532
1 Like

Perfect thank you.It works fine

---------- Post updated at 02:33 PM ---------- Previous update was at 02:30 PM ----------

strange I checked it again here is the result

~/newdata/phytozome_perlfiles# nawk 'FNR==1{h=$0;next} {a[$1]=($1 in a)?a[$1] "||" $2:$2}END{print h;for(i in a) print i,a}' OFS='\t' tmp
Transcript Name          GO
POPTR_0013s06470.1      ||12344||13248||14565||16817
POPTR_0016s06290.1      ||98654
POPTR_0004s22390.1      ||12897
POPTR_2158s00200.1      ||11324
POPTR_0001s11490.1      ||
POPTR_0015s05840.1      ||13455
POPTR_0016s13950.1      ||14532

Try this awk script...

nawk '{if(NR>1){r[$1]=r[$1]?r[$1]"||"$2:$2}else s=$0}END{print s;for(i in r)print i"\t"r}' file