Hello,
I am trying to get a script to work which will find the maximum value of the fourth column and assign that value to all rows where the first three columns match.
For example:
1111 2222 AAAA 0.3
3333 4444 BBBB 0.7
1111 2222 AAAA 0.9
1111 2222 AAAA 0.5
3333 4444 BBBB 0.4
should give:
1111 2222 AAAA 0.9
3333 4444 BBBB 0.7
1111 2222 AAAA 0.9
1111 2222 AAAA 0.9
3333 4444 BBBB 0.7
or even better if it returns unique instances:
1111 2222 AAAA 0.9
3333 4444 BBBB 0.7
Thanks in advance!
Hi,
try:
awk '{if ($4 > a[$1" "$2" "$3])a[$1" "$2" "$3]=$4}END{for (i in a) print i, a}' file
Output:
3333 4444 BBBB 0.7
1111 2222 AAAA 0.9
HTH Chris
1 Like
frans
July 25, 2010, 11:20am
3
Works with the provided example (only if 3 first fields are fixed width)
$ cat infile
1111 2222 AAAA 0.3
3333 4444 BBBB 0.7
1111 2222 AAAA 0.9
1111 2222 AAAA 0.5
3333 4444 BBBB 0.4
$ sort -rn infile | uniq -w14
3333 4444 BBBB 0.7
1111 2222 AAAA 0.9
Thanks Christoph, I think that your method will work for me. But there is a slight problem, unlike the sample input that I have given in the above example, the columns are actually tab separated. Also the third column can have spaces in some cases. The script fails in those cases. Can you please suggest a fix?
Please give examples of the cases where the script failes, so i can test the script.
Please see this:
There are 4 tab-delimited columns. 3rd column may have spaces in the name.
2655 96 IA 0.8179
2655 96 eugene 0.8144
2655 96 CPU 0.4275
2655 96 RMA 0.3407
2655 96 P Proc Tran CPU 0.3377
2655 96 ASUS K 0.2846
2655 96 MSI 0.1921
2655 96 LGVu 0.029
2655 96 IA 0.8373
2655 96 eugene 0.8338
2655 96 CPU 0.4469
2655 96 RMA 0.31
2655 96 P Proc Tran CPU 0.3571
Result should look like this:
2655 96 IA 0.8373
2655 96 eugene 0.8338
2655 96 CPU 0.4469
2655 96 RMA 0.3407
2655 96 P Proc Tran CPU 0.3571
2655 96 ASUS K 0.2846
2655 96 MSI 0.1921
2655 96 LGVu 0.029
frans
July 25, 2010, 12:37pm
7
bash
#!/bin/bash
IFS=" " # is a Tab
sort -r infile | while read A B C D
do [ "$L" != "$A$B$C" ] && { L=$A$B$C; echo -e "$A\t$B\t$C\t$D"; }
done
if you need a sorted output
#!/bin/bash
IFS=" " # is a Tab
sort -r infile | ( while read A B C D
do [ "$L" != "$A$B$C" ] && { L=$A$B$C; echo -e "$A\t$B\t$C\t$D"; }
done ) | sort
awk (thanks to Christoph Spohr, i just added -F'\t' and replaced spaces with tabs)
awk -F'\t' '{if ($4 > a[$1"\t"$2"\t"$3])a[$1"\t"$2"\t"$3]=$4}END{for (i in a) print i"\t"a}' file
1 Like
awk -F "\t" '{a[$1 FS $2 FS $3]=(a[$1 FS $2 FS $3]>$NF)?a[$1 FS $2 FS $3]:$NF}
END {for (i in a) print i FS a}' urfile
1 Like
thanks guys for your help