Issue using awk to print min values for unique ids

I am using the following script to search for and print minimum values for each individual Fields (3-14) for each unique id (Field 1). But when the field contains a "-99.99" ( I am ignoring "-99.99") and when the minimum value is the first line of a new id (Field 1), the output does not print Field 2, it leave a blank space. Any ideas on how to correct this?

The below snipit is looking for the min value in field 12 only.

awk ' /-99.99/ {next} {if (a[$1] == "") a[$1] = $12; if (a[$1] > $12) {a[$1]=$12" " $2} } END { for (i in a) { print i, a} }' 

Input file sample

01001   012017  10.64   3.96    4.45    3.48    9.58    10.76   4.44    5.03    2.78    1.45    1.26    3.24
01001   012018  3.94    6.18    4.47    4.79    11.07   3.01    4.27    6.37    6.48    4.52    6.16    9.95
01001   012019  6.77    3.79    3.96    -99.99  -99.99  -99.99  -99.99  -99.99  -99.99  -99.99  -99.99  -99.99
01003   011895  6.46    3.53    6.69    4.63    3.54    11.71   7.64    8.02    2.45    3.21    1.35    4.05
01003   011896  3.80    5.88    5.63    2.45    2.83    6.98    5.24    5.60    2.90    5.38    4.09    2.45

Current Output

01001   1.45 
01003   3.21  011895  

Expected Output

01001   1.45  012017 
01003   3.21  011895  

a bit verbose, but..... sometyhing along these lines.
awk -f ncw.awk myFile where ncw.awk is:

{
  for(i=3;i<=NF;i++) {
     key[$1]
     if ( $i >0 && (!(($1, i) in f1) || f1[key[$1],i] > $i)) {
       f1[$1,i] = $i
       if (!($1,i) in f2) f2[$1,i] = $2
     }
  }
  nf=i
}
END {
  for( k in key)
    for(i=3; i<= nf; i++) {
      printf("%s%s%s", (i==3)?k OFS f2[k,i]:"", OFS, (i==nf)?f1[k,i] ORS:f1[k,i])
    }
}

results in (given your sample input file):

01001 012017 10.64 3.96 4.45 3.48 9.58 10.76 4.44 5.03 2.78 1.45 1.26 3.24
01003 011895 6.46 3.53 6.69 4.63 3.54 11.71 7.64 8.02 2.45 3.21 1.35 4.05
2 Likes

I get other data

awk '
!($1 in t)      {t[$1] = $3}
!/-99.99/       {for(i = 3; i <= NF; i++) if (t[$1] > $i) {
                 t[$1] = $i; k[$1] = $2}
                 }
END             { for (i in t) print i, t, k
                 }' file

--- Post updated at 19:46 ---

01001 1.26 012017
01003 1.35 011895

--- Post updated at 19:52 ---

fix

awk '
!/-99.99/       {if (! t[$1]) t[$1] = $3
                 for(i = 3; i <= NF; i++) if (t[$1] > $i) {
                 t[$1] = $i; k[$1] = $2}
                }
END             { for (i in t) print i, t, k
                }' file

--- Post updated at 19:59 ---

Not until the end understood the task :confused:

1 Like

Hello ncwxpanther,

Could you please try following too, it considers that your Input_file is sorted by 1st field(as per shown samples),if not then we could add like sort -k1 Input_file | awk .... too before my following code.

awk '
!prev{
  prev_first=$1
  prev_sec=$2
}
prev!=$1 && prev{
  print prev_first,min,prev_sec
  prev_first=$1
  prev_sec=$2
  min=""
}
{
  for(i=3;i<=NF;i++){
      if($i!=-99.99){
          min=min<$i?min?min:$i:$i
      }
  }
  prev=$1
}
END{
  if(min){
      print prev_first,min,prev_sec
  }
}'   Input_file

Output will be as follows.

01001 1.26 012017
01003 1.35 011895

Thanks,
R. Singh

fix

awk '
!/-99.99/       {if (! t[$1]) t[$1] = $3
                 for(i = 3; i <= NF; i++) if (t[$1] > $i) {
                 t[$1] = $i; k[$1] = $2}
                }
END             { for (i in t) print i, t, k
                }' file

--- Post updated at 19:59 ---

Not until the end understood the task :confused:
[/quote]

Thanks nezabudka

Is the field of interest determined by counting from the last field in the file?

 for(i = 3; i <= NF; i++)

So would this be the 3rd field from the end? The output seems correct, but I am putting in variables for each field from the 3rd to the 14th. So in the below snipit im considering 10.64 as the 3rd field and 3.24 as the 14th field.

01001   012017  10.64   3.96    4.45    3.48    9.58    10.76   4.44    5.03    2.78    1.45    1.26    3.24

In other words, what would be the correct syntax for find the min value of the 3rd field?

ncwxpanther ,
in my post #2 , the suggested implementation calculates ALL the min values for ALL the fields starting at 3.

Try:

awk '
  !($1 in M) {
    M[$1]=$3
  }
  {
    for(i=3; i<=NF; i++) 
      if($i>-99.99 && $i<=M[$1]) {
        M[$1]=$i
        V[$1]=$2
      }
  }
  END {
    for(i in M) print i, M, V
  }
' file 
01001 1.26 012017
01003 1.35 011895

I see. To be more specific I need the script to look at each field separately for each unique ID. So the min value for just Field 3 or just Field 4.

Thanks for the help, but this prints out the first line for each unique ID. Not necessarily the min value.

Input

01001   011895  7.03    2.96    8.36    3.53    3.96    5.40    3.92    3.36    0.73    2.03    1.44    3.66
01001   011896  5.86    5.42    5.54    3.98    3.77    6.24    4.38    2.57    0.82    1.66    2.89    1.94
01001   011897  3.27    6.63    10.94   4.35    0.81    1.57    3.96    5.02    0.87    0.75    1.84    4.38
01001   011898  2.33    2.07    2.60    4.56    0.54    3.13    5.80    6.02    1.51    3.21    6.66    3.91
01001   011899  5.80    6.94    3.35    2.22    2.93    2.31    6.80    2.90    0.63    3.02    1.98    5.25
.....
01003   011895  6.46    3.53    6.69    4.63    3.54    11.71   7.64    8.02    2.45    3.21    1.35    4.05
01003   011896  3.80    5.88    5.63    2.45    2.83    6.98    5.24    5.60    2.90    5.38    4.09    2.45
01003   011897  3.95    6.62    8.19    4.43    1.09    2.44    7.06    14.69   2.08    1.29    2.45    3.91
01003   011898  2.82    4.25    2.04    3.64    0.73    5.87    6.72    11.96   9.90    1.46    7.88    5.09
01003   011899  5.64    4.47    3.37    0.97    1.16    5.90    7.81    6.53    0.31    2.23    3.75    4.79
.....

Output

01001 011895 7.03 2.96 8.36 3.53 3.96 5.40 3.92 3.36 0.73 2.03 1.44 3.66
01003 011895 6.46 3.53 6.69 4.63 3.54 11.71 7.64 8.02 2.45 3.21 1.35 4.05

sorry - my bad. try this version:

{
  for(i=3;i<=NF;i++) {
     key[$1]
     if ( $i >0 && (!(($1, i) in f1) || f1[$1,i] > $i)) {
       f1[$1,i] = $i
       if (!($1,i) in f2) f2[$1,i] = $2
     }
  }
  nf=i
}
END {
  for( k in key)
    for(i=3; i<= nf; i++) {
      printf("%s%s%s", (i==3)?k OFS f2[k,i]:"", OFS, (i==nf)?f1[k,i] ORS:f1[k,i])
    }
}

These appear to be the same. Did I miss something?

Adaptation of post #7 for the new requirements:

awk -v c=4 '
  !($1 in M) {
    M[$1]=$c
  }
  {
    if($c>-99.99 && $c<=M[$1]) {
      M[$1]=$c
      V[$1]=$2
    }
  }
  END {
    for(i in M) print i, M, V
  }
' file

c is the column number

1 Like

Thanks Scrutinizer. The output is as desired.

it's not the same:

{
  for(i=3;i<=NF;i++) {
     key[$1]
     if ( $i >0 && (!(($1, i) in f1) || f1[$1,i] > $i)) {
       f1[$1,i] = $i
       if (!($1,i) in f2) f2[$1,i] = $2
     }
  }
  nf=i
}
END {
  for( k in key)
    for(i=3; i<= nf; i++) {
      printf("%s%s%s", (i==3)?k OFS f2[k,i]:"", OFS, (i==nf)?f1[k,i] ORS:f1[k,i])
    }
}

given your sample file:

01001   011895  7.03    2.96    8.36    3.53    3.96    5.40    3.92    3.36    0.73    2.03    1.44    3.66
01001   011896  5.86    5.42    5.54    3.98    3.77    6.24    4.38    2.57    0.82    1.66    2.89    1.94
01001   011897  3.27    6.63    10.94   4.35    0.81    1.57    3.96    5.02    0.87    0.75    1.84    4.38
01001   011898  2.33    2.07    2.60    4.56    0.54    3.13    5.80    6.02    1.51    3.21    6.66    3.91
01001   011899  5.80    6.94    3.35    2.22    2.93    2.31    6.80    2.90    0.63    3.02    1.98    5.25
01003   011895  6.46    3.53    6.69    4.63    3.54    11.71   7.64    8.02    2.45    3.21    1.35    4.05
01003   011896  3.80    5.88    5.63    2.45    2.83    6.98    5.24    5.60    2.90    5.38    4.09    2.45
01003   011897  3.95    6.62    8.19    4.43    1.09    2.44    7.06    14.69   2.08    1.29    2.45    3.91
01003   011898  2.82    4.25    2.04    3.64    0.73    5.87    6.72    11.96   9.90    1.46    7.88    5.09
01003   011899  5.64    4.47    3.37    0.97    1.16    5.90    7.81    6.53    0.31    2.23    3.75    4.79

produces:

01001 011895 2.33 2.07 2.60 2.22 0.54 1.57 3.92 2.57 0.63 0.75 1.44 1.94
01003 011895 2.82 3.53 2.04 0.97 0.73 2.44 5.24 5.60 0.31 1.29 1.35 2.45