Finding Maximum value in a column

Hello,

I am trying to get a script to work which will find the maximum value of the fourth column and assign that value to all rows where the first three columns match.

For example:

1111 2222 AAAA 0.3
3333 4444 BBBB 0.7
1111 2222 AAAA 0.9
1111 2222 AAAA 0.5
3333 4444 BBBB 0.4

should give:

1111 2222 AAAA 0.9
3333 4444 BBBB 0.7
1111 2222 AAAA 0.9
1111 2222 AAAA 0.9
3333 4444 BBBB 0.7

or even better if it returns unique instances:

1111 2222 AAAA 0.9
3333 4444 BBBB 0.7

Thanks in advance!

Hi,

try:

awk '{if ($4 > a[$1" "$2" "$3])a[$1" "$2" "$3]=$4}END{for (i in a) print i, a}' file

Output:

3333 4444 BBBB 0.7
1111 2222 AAAA 0.9

HTH Chris

1 Like

Works with the provided example (only if 3 first fields are fixed width)

$ cat infile
1111 2222 AAAA 0.3
3333 4444 BBBB 0.7
1111 2222 AAAA 0.9
1111 2222 AAAA 0.5
3333 4444 BBBB 0.4
$ sort -rn infile | uniq -w14
3333 4444 BBBB 0.7
1111 2222 AAAA 0.9

Thanks Christoph, I think that your method will work for me. But there is a slight problem, unlike the sample input that I have given in the above example, the columns are actually tab separated. Also the third column can have spaces in some cases. The script fails in those cases. Can you please suggest a fix?

Please give examples of the cases where the script failes, so i can test the script.

Please see this:
There are 4 tab-delimited columns. 3rd column may have spaces in the name.

2655	96	IA	0.8179
2655	96	eugene	0.8144
2655	96	CPU	0.4275
2655	96	RMA	0.3407
2655	96	P Proc Tran CPU	0.3377
2655	96	ASUS K	0.2846
2655	96	MSI	0.1921
2655	96	LGVu	0.029
2655	96	IA	0.8373
2655	96	eugene	0.8338
2655	96	CPU	0.4469
2655	96	RMA	0.31
2655	96	P Proc Tran CPU	0.3571

Result should look like this:

2655	96	IA	0.8373
2655	96	eugene	0.8338
2655	96	CPU	0.4469
2655	96	RMA	0.3407
2655	96	P Proc Tran CPU	0.3571
2655	96	ASUS K	0.2846
2655	96	MSI	0.1921
2655	96	LGVu	0.029

bash

#!/bin/bash
IFS="	"	# is a Tab
sort -r infile | while read A B C D
do	[ "$L" != "$A$B$C" ] && { L=$A$B$C; echo -e "$A\t$B\t$C\t$D"; }
done

if you need a sorted output

#!/bin/bash
IFS="	"	# is a Tab
sort -r infile | ( while read A B C D
do	[ "$L" != "$A$B$C" ] && { L=$A$B$C; echo -e "$A\t$B\t$C\t$D"; }
done ) | sort

awk (thanks to Christoph Spohr, i just added -F'\t' and replaced spaces with tabs)

awk -F'\t' '{if ($4 > a[$1"\t"$2"\t"$3])a[$1"\t"$2"\t"$3]=$4}END{for (i in a) print i"\t"a}' file
1 Like
awk -F "\t" '{a[$1 FS $2 FS $3]=(a[$1 FS $2 FS $3]>$NF)?a[$1 FS $2 FS $3]:$NF}
                  END {for (i in a) print i FS a}' urfile
1 Like

thanks guys for your help