Help with simple bash script involving a loop

aberg · May 26, 2016, 1:53pm

Dear unix wizards,

I'd be very grateful for your help with the following.

I have a hypothetical file (file.txt) with three columns:

Column 2 consists of pairs of integers from 1 to 4 (each number only occurs twice). I want to:

find the two lines with matching values for column 2
then of the two, pick out the line that has the greatest value for column 3
then finally print column 1 of that line.
I want to use a looped bash script to do this, as in reality, the values in column 2 go from 1 to about 10,000.

I have tried:

#! bin/bash
for i in {1..4}
do
cat file.txt | awk '{print $2}' | grep -w "$i" | sort -k 3 | head -2 | tail -1 | awk '{print $1}'
done

In the hope that it would give me an output that looks like:

However, I'm getting nothing at all, and have clearly gravely misunderstood something here.

Please help!

Many thanks.

---------- Post updated at 12:53 PM ---------- Previous update was at 12:43 PM ----------

I'd omitted "$" before "i".
Silly mistake - sorry.

It sort of works now, but because I used awk '{print $2}' to search in that column, the final value that is printed is not from the original column 1 in the file, but from column 2 (as that is the only remaining column).

Is there a way around this?
And also, a more elegant way to script this?

Thanks.

RavinderSingh13 · May 26, 2016, 2:07pm

Hello aberg,

If you are not worried about the order of the printing of 1st field then following may help you in same.
Please let me know how it goes then.

awk 'FNR==NR{A[$2]=A[$2]>$3?A[$2]:$3;next} ($2 in A){if($3==A[$2]){print $1;delete A[$2]}}'  Input_file  Input_file

Thanks,
R. Singh

rdrtx1 · May 26, 2016, 2:08pm

sort -n -k2 -k3 infile | awk '!a[$2]++ && l {print l} {l=$1}; END {print l}'

Aia · May 26, 2016, 2:20pm

awk '{if(p[$2]<$3){p[$2]=$3;c[$2]=$1;}}END{for(i in c){print c}}' example.data

Output:

aberg · May 26, 2016, 2:21pm

Thanks RavinderSingh13 and rdrtx1. Much appreciated.

MadeInGermany · May 27, 2016, 4:22am

This one prints at the first occasion in the main loop, i.e. does not need an explicit loop in the END section

awk '($2 in A3) {print ($3>A3[$2] ? $1 : A1[$2]); next} {A1[$2]=$1; A3[$2]=$3}' file.txt

Also I have the habit to check for existence first ( $2 in A3 ) so I don't need to think about unitialized fields, negative numbers, etc.