Hi,
I have a file with 1M records
ABC 200 400 2.4 5.6
ABC 410 299 12 1.5
XYZ 4 5 6 7
MNO 22 40 30 70
MNO 47 55 80 150
What I want is for all the rows it should take the max value where there are duplicates
output
ABC 410 400 12 5.6
XYZ 4 5 6 7
MNO 47 55 80 150
How can i do this in awk/unix?
Thanks,
Yoda
January 21, 2014, 5:38pm
2
If it is OK that the order is not preserved:
awk '
!( $1 in A ) {
A[$1] = $0
next
}
( $1 in A ) {
n = split ( A[$1], R )
for ( i = 2; i <= n; i++ )
{
R = R > $i ? R : $i
s = s ? s FS R : R
}
A[$1] = $1 FS s
s = ""
}
END {
for ( k in A )
print A[k]
}
' file
Thanks,
The order does not matter. I tried with my data and it did not work. Here is my original data
Yoda
January 21, 2014, 5:53pm
4
Looks like your original data is tab separated. Apply below change in the code and retry:
awk -F'\t' '
I tried that
It gives the following error
TN_rpkm: line 919: linc-TMEM183B-1: command not found
TN_rpkm: line 920: linc-PRELP: command not found
Yoda
January 21, 2014, 5:58pm
6
The code that I posted has just 21 lines.
But the error that you posted is reporting issue at line 919 and 920!
Sorry I messed up something at my end. Thanks I check few rows and it worked perfectly fine.
Thanks,