Alyaa
January 8, 2014, 1:42am
1
Hi
I have a data frame with repeated names in column 1, and different descriptors in column 2. I want to merge/cat strings that have same entry in column 1 into one row with any separator.
Example for input:
Cvel_1 KOG0155
Cvel_1 KOG0306
Cvel_1 KOG3259
Cvel_1 KOG0931
Cvel_1 KOG3638
Cvel_1 KOG0956
Example for desired output:
Cvel_1 KOG0155, KOG0306, KOG3259, KOG0931, KOG0956
Thanks a lot
Alyaa
pamu
January 8, 2014, 1:47am
2
try
$ cat file
Cvel_1 KOG0155
Cvel_1 KOG0306
Cvel_1 KOG3259
Cvel_1 KOG0931
Cvel_1 KOG3638
Cvel_1 KOG0956
$ awk '{A[$1]=A[$1]?A[$1] ", " $NF :$1 " "$NF}END{for (i in A)print A}' file
Cvel_1 KOG0155, KOG0306, KOG3259, KOG0931, KOG3638, KOG0956
Alyaa
January 8, 2014, 3:33am
3
Thank you Pamu very much, it works just fine
However, when I try the same command for sth like this:
"Cvel_1" " Transcription factor CA150 "
"Cvel_1" " WD40-repeat-containing subunit of the 18S rRNA processing complex "
"Cvel_1" " Peptidyl-prolyl cis-trans isomerase "
"Cvel_1" " Predicted guanine nucleotide exchange factor, contains Sec7 domain "
the output is:
"Cvel_1" ";";";"
your help and prompt response are much appreciated
thank you
pamu
January 8, 2014, 3:58am
4
try
awk -F "\t" '{A[$1]=A[$1]?A[$1] ", " $NF :$1 " "$NF}END{for (i in A)print A}' file
Alyaa
January 8, 2014, 3:59am
5
Thank you VERY much
Perfectly fine