Dear unix wizards,
I'd be very grateful for your help with the following.
I have a hypothetical file (file.txt) with three columns:
111 4 0.01
112 3 0.02
113 2 0.03
114 1 0.04
115 1 0.06
116 2 0.02
117 3 0.01
118 4 0.05
Column 2 consists of pairs of integers from 1 to 4 (each number only occurs twice). I want to:
- find the two lines with matching values for column 2
- then of the two, pick out the line that has the greatest value for column 3
- then finally print column 1 of that line.
I want to use a looped bash script to do this, as in reality, the values in column 2 go from 1 to about 10,000.
I have tried:
#! bin/bash
for i in {1..4}
do
cat file.txt | awk '{print $2}' | grep -w "$i" | sort -k 3 | head -2 | tail -1 | awk '{print $1}'
done
In the hope that it would give me an output that looks like:
115
113
112
118
However, I'm getting nothing at all, and have clearly gravely misunderstood something here.
Please help!
Many thanks.
---------- Post updated at 12:53 PM ---------- Previous update was at 12:43 PM ----------
I'd omitted "$" before "i".
Silly mistake - sorry.
It sort of works now, but because I used awk '{print $2}'
to search in that column, the final value that is printed is not from the original column 1 in the file, but from column 2 (as that is the only remaining column).
Is there a way around this?
And also, a more elegant way to script this?
Thanks.