Find duplicates in column 1 and merge their lines (awk?)

Hi,

I have a file (sorted by sort) with 8 tab delimited columns. The first column contains duplicated fields and I need to merge all these identical lines.

My input file:

comp100002	aaa	bbb	ccc	ddd	eee	fff	ggg
comp100003	aba	aba	aba	aba	aba	aba	aba
comp100003	fff	fff	fff	fff	fff	fff	fff
comp100004	xxx	xyz	xyz	xxx	xyz	xxx	xyz

My desired output file:

comp100002	aaa	bbb	ccc	ddd	eee	fff	ggg
comp100003	aba	aba	aba	aba	aba	aba	aba	fff	fff	fff	fff	fff	fff	fff
comp100004	xxx	xyz	xyz	xxx	xyz	xxx	xyz

Thanks for advice.

try:

awk '
!(a[$1]) {a[$1]=$0}
a[$1] {w=$1; $1=""; a[w]=a[w] $0}
END {for (i in a) print a}
' FS="\t" OFS="\t" infile
1 Like

Thanks a lot, it prints desired results. However, if there is a single-copy identifier in field 1, it appends whole line twice. It's easy to get rid of these 8 additional columns, but since I am learning, could you please comment which part of the code is responsible for this?

try:

awk 'p!=$1{if(p)print s; p=s=$1} {sub(p,x); s=s $0} END{if(p)print s}' FS='\t' file
1 Like

Fixed, try:

awk '
!(a[$1]) {a[$1]=$0; next}
a[$1] {w=$1; $1=""; a[w]=a[w] $0}
END {for (i in a) print a}
' FS="\t" OFS="\t" infile
2 Likes

Thanks guys. Checked by diff and results of both scripts are now identical.