Identify max value in diff columns for same row

Diya123 · January 21, 2014, 5:31pm

Hi,

I have a file with 1M records

ABC 200 400 2.4 5.6
ABC 410 299 12  1.5
XYZ 4 5 6 7
MNO 22 40 30 70
MNO 47 55 80 150

What I want is for all the rows it should take the max value where there are duplicates

output

ABC 410 400 12 5.6
XYZ 4 5 6 7
MNO 47 55 80 150

How can i do this in awk/unix?

Thanks,

Yoda · January 21, 2014, 5:38pm

If it is OK that the order is not preserved:

awk '
        !( $1 in A ) {
                A[$1] = $0
                next
        }
        ( $1 in A ) {
                n = split ( A[$1], R )
                for ( i = 2; i <= n; i++ )
                {
                        R = R > $i ? R : $i
                        s = s ? s FS R : R
                }
                A[$1] = $1 FS s
                s = ""
        }
        END {
                for ( k in A )
                        print A[k]
        }
' file

Diya123 · January 21, 2014, 5:51pm

Thanks,

The order does not matter. I tried with my data and it did not work. Here is my original data

Yoda · January 21, 2014, 5:53pm

Looks like your original data is tab separated. Apply below change in the code and retry:

awk -F'\t' '

Diya123 · January 21, 2014, 5:56pm

I tried that

It gives the following error

TN_rpkm: line 919: linc-TMEM183B-1: command not found
TN_rpkm: line 920: linc-PRELP: command not found

Yoda · January 21, 2014, 5:58pm

The code that I posted has just 21 lines.

But the error that you posted is reporting issue at line 919 and 920!

Diya123 · January 21, 2014, 6:08pm

Sorry I messed up something at my end. Thanks I check few rows and it worked perfectly fine.

Thanks,