awk - mixed for and if to select particular lines in a data file

naska · July 5, 2013, 12:20pm

Hi all,

I am new to AWK and I am trying to solve a problem that is probably easy for an expert. Suppose I have the following data file input.txt:

I want to select from all lines having the first column equal value the particular line with the minimum of the second column value. That is to say I would like that the AWK script would be able to produce the following file output.txt:

20 23 54
20.5 33 11
21 22 21

I have already try to find an answer on many forums but without success. Can you help me?

Don_Cragun · July 5, 2013, 1:00pm

The following produces the output you requested but the order of the output is unspecified:

awk '
NF < 2 {next
}
!($1 in m) || m[$1] > $2 {
        m[$1] = $2
        o[$1] = $0
}
END {   for(i in o) print o
}' input.txt

If you are using a Solaris system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk instead of /usr/bin/awk or /bin/awk .

Your sample has all of the input with a given 1st column value grouped together, but your statement of requirements didn't say anything about this. The code above accepts input in any order.

If your input always has all lines with the same 1st column value on adjacent input lines, this script can be rewritten to produce output when the 1st column value changes. This would take fewer resources for large input files and would produce output in the same order as the input.

bartus11 · July 5, 2013, 1:03pm

Try:

sort file | awk '!a[$1]++'

Don_Cragun · July 5, 2013, 1:50pm

This provides an easy way to group 1st field values together, but it also produces an empty output line that the OP doesn't seem to want and it will only work correctly if all 2nd field values in each group have the same number of digits before the decimal point (if a decimal point occurs in any 2nd field value within a group) and have no leading plus-signs (+) unless all non-negative values in a group have a leading plus-sign. The last part of this can be fixed trivially by adding the -n option to sort. Getting rid of the blank line is also easy (if it matters):

sort -n input.txt | awk 'NF > 1 && !a[$1]++'

Scrutinizer · July 5, 2013, 2:58pm

Another approach, assuming grouped values in the first column, but maintaining order of input file:

awk '!NF{next} p!=$1{if(s)print s; m=$2} $2<m{m=$2; s=$0}{p=$1} END{print s}' file