finding least out of selected lines

Hello,

I have a file, which looks like:

I want to print the row containg "PRO" in second column after comparing and finding the minimum value of fifth column present in all "PRO". and likewise for every other string present in second column.

I am using :

 
filename=list
exec<$filename
while read line
do
awk '{print $2,"\t"$5,"\t"$1,"\t"$3,"\t"$4}' $line | sort | uniq | awk '{if ($1 != prev_1 && $2 != prev_2){print}; prev_1=$1; prev_2=$2}' > $line"20m"
done

I am getting results, but I didnt understand this command....
and if there is only one string like "SER" in 2nd row, it is not printed in output file. Whereas, I want to have all strings with minimum fifth column.
Can any one plz suggest me for the same. or make me understand the command ??????

Try:

awk '!a[$2]{a[$2]=$0;m[$2]=$5}$5<m[$2]{a[$2]=$0;m[$2]=$5}END{for (i in a) print a}' file
1 Like

Well, use shell or awk, not both.

sed '
  /^[0-9]* PRO /!d
  s/.* //
 ' your_file |sort -nu | read key5
 
grep "^[0-9]* PRO .* $key5$" your_file

The file name of a list of file names file goes into a variable, that file is made stdin for the rest of the script, each line is read into a line variable, awk is called for that file name to rearrange the fields using tab separators, it is sorted left to right (not numeric, 10 may be less than 9, made unique (sort does that better with a u), fed to a second awk that test the first two fields against saved prior 2 fields, only prints the first for any set of values, with output to a file with same name suffix 20m.

1 Like