Issue using awk to print min values for unique ids

ncwxpanther · April 23, 2019, 11:33am

I am using the following script to search for and print minimum values for each individual Fields (3-14) for each unique id (Field 1). But when the field contains a "-99.99" ( I am ignoring "-99.99") and when the minimum value is the first line of a new id (Field 1), the output does not print Field 2, it leave a blank space. Any ideas on how to correct this?

The below snipit is looking for the min value in field 12 only.

awk ' /-99.99/ {next} {if (a[$1] == "") a[$1] = $12; if (a[$1] > $12) {a[$1]=$12" " $2} } END { for (i in a) { print i, a} }'

Input file sample

01001   012017  10.64   3.96    4.45    3.48    9.58    10.76   4.44    5.03    2.78    1.45    1.26    3.24
01001   012018  3.94    6.18    4.47    4.79    11.07   3.01    4.27    6.37    6.48    4.52    6.16    9.95
01001   012019  6.77    3.79    3.96    -99.99  -99.99  -99.99  -99.99  -99.99  -99.99  -99.99  -99.99  -99.99
01003   011895  6.46    3.53    6.69    4.63    3.54    11.71   7.64    8.02    2.45    3.21    1.35    4.05
01003   011896  3.80    5.88    5.63    2.45    2.83    6.98    5.24    5.60    2.90    5.38    4.09    2.45

Current Output

01001   1.45 
01003   3.21  011895

Expected Output

01001   1.45  012017 
01003   3.21  011895

vgersh99 · April 23, 2019, 12:36pm

a bit verbose, but..... sometyhing along these lines.
awk -f ncw.awk myFile where ncw.awk is:

{
  for(i=3;i<=NF;i++) {
     key[$1]
     if ( $i >0 && (!(($1, i) in f1) || f1[key[$1],i] > $i)) {
       f1[$1,i] = $i
       if (!($1,i) in f2) f2[$1,i] = $2
     }
  }
  nf=i
}
END {
  for( k in key)
    for(i=3; i<= nf; i++) {
      printf("%s%s%s", (i==3)?k OFS f2[k,i]:"", OFS, (i==nf)?f1[k,i] ORS:f1[k,i])
    }
}

results in (given your sample input file):

01001 012017 10.64 3.96 4.45 3.48 9.58 10.76 4.44 5.03 2.78 1.45 1.26 3.24
01003 011895 6.46 3.53 6.69 4.63 3.54 11.71 7.64 8.02 2.45 3.21 1.35 4.05

nezabudka · April 23, 2019, 12:59pm

I get other data

awk '
!($1 in t)      {t[$1] = $3}
!/-99.99/       {for(i = 3; i <= NF; i++) if (t[$1] > $i) {
                 t[$1] = $i; k[$1] = $2}
                 }
END             { for (i in t) print i, t, k
                 }' file

--- Post updated at 19:46 ---

01001 1.26 012017
01003 1.35 011895

--- Post updated at 19:52 ---

fix

awk '
!/-99.99/       {if (! t[$1]) t[$1] = $3
                 for(i = 3; i <= NF; i++) if (t[$1] > $i) {
                 t[$1] = $i; k[$1] = $2}
                }
END             { for (i in t) print i, t, k
                }' file

--- Post updated at 19:59 ---

Not until the end understood the task

RavinderSingh13 · April 23, 2019, 2:14pm

Hello ncwxpanther,

Could you please try following too, it considers that your Input_file is sorted by 1st field(as per shown samples),if not then we could add like sort -k1 Input_file | awk .... too before my following code.

awk '
!prev{
  prev_first=$1
  prev_sec=$2
}
prev!=$1 && prev{
  print prev_first,min,prev_sec
  prev_first=$1
  prev_sec=$2
  min=""
}
{
  for(i=3;i<=NF;i++){
      if($i!=-99.99){
          min=min<$i?min?min:$i:$i
      }
  }
  prev=$1
}
END{
  if(min){
      print prev_first,min,prev_sec
  }
}'   Input_file

Output will be as follows.

01001 1.26 012017
01003 1.35 011895

Thanks,
R. Singh

ncwxpanther · April 23, 2019, 2:56pm

fix

awk '
!/-99.99/       {if (! t[$1]) t[$1] = $3
                 for(i = 3; i <= NF; i++) if (t[$1] > $i) {
                 t[$1] = $i; k[$1] = $2}
                }
END             { for (i in t) print i, t, k
                }' file

--- Post updated at 19:59 ---

Not until the end understood the task
[/quote]

Thanks nezabudka

Is the field of interest determined by counting from the last field in the file?

 for(i = 3; i <= NF; i++)

So would this be the 3rd field from the end? The output seems correct, but I am putting in variables for each field from the 3rd to the 14th. So in the below snipit im considering 10.64 as the 3rd field and 3.24 as the 14th field.

01001   012017  10.64   3.96    4.45    3.48    9.58    10.76   4.44    5.03    2.78    1.45    1.26    3.24

In other words, what would be the correct syntax for find the min value of the 3rd field?

vgersh99 · April 23, 2019, 3:00pm

ncwxpanther ,
in my post #2 , the suggested implementation calculates ALL the min values for ALL the fields starting at 3.

Scrutinizer · April 23, 2019, 3:01pm

Try:

awk '
  !($1 in M) {
    M[$1]=$3
  }
  {
    for(i=3; i<=NF; i++) 
      if($i>-99.99 && $i<=M[$1]) {
        M[$1]=$i
        V[$1]=$2
      }
  }
  END {
    for(i in M) print i, M, V
  }
' file

01001 1.26 012017
01003 1.35 011895

ncwxpanther · April 23, 2019, 4:25pm

I see. To be more specific I need the script to look at each field separately for each unique ID. So the min value for just Field 3 or just Field 4.

ncwxpanther · April 24, 2019, 10:31am

vgersh99:

a bit verbose, but..... sometyhing along these lines.
awk -f ncw.awk myFile where ncw.awk is:

{
  for(i=3;i<=NF;i++) {
   key[$1]
   if ( $i >0 && (!(($1, i) in f1) || f1[key[$1],i] > $i)) {
   f1[$1,i] = $i
   if (!($1,i) in f2) f2[$1,i] = $2
   }
  }
  nf=i
}
END {
  for( k in key)
   for(i=3; i<= nf; i++) {
   printf("%s%s%s", (i==3)?k OFS f2[k,i]:"", OFS, (i==nf)?f1[k,i] ORS:f1[k,i])
   }
}

results in (given your sample input file):

01001 012017 10.64 3.96 4.45 3.48 9.58 10.76 4.44 5.03 2.78 1.45 1.26 3.24
01003 011895 6.46 3.53 6.69 4.63 3.54 11.71 7.64 8.02 2.45 3.21 1.35 4.05

Thanks for the help, but this prints out the first line for each unique ID. Not necessarily the min value.

Input

01001   011895  7.03    2.96    8.36    3.53    3.96    5.40    3.92    3.36    0.73    2.03    1.44    3.66
01001   011896  5.86    5.42    5.54    3.98    3.77    6.24    4.38    2.57    0.82    1.66    2.89    1.94
01001   011897  3.27    6.63    10.94   4.35    0.81    1.57    3.96    5.02    0.87    0.75    1.84    4.38
01001   011898  2.33    2.07    2.60    4.56    0.54    3.13    5.80    6.02    1.51    3.21    6.66    3.91
01001   011899  5.80    6.94    3.35    2.22    2.93    2.31    6.80    2.90    0.63    3.02    1.98    5.25
.....
01003   011895  6.46    3.53    6.69    4.63    3.54    11.71   7.64    8.02    2.45    3.21    1.35    4.05
01003   011896  3.80    5.88    5.63    2.45    2.83    6.98    5.24    5.60    2.90    5.38    4.09    2.45
01003   011897  3.95    6.62    8.19    4.43    1.09    2.44    7.06    14.69   2.08    1.29    2.45    3.91
01003   011898  2.82    4.25    2.04    3.64    0.73    5.87    6.72    11.96   9.90    1.46    7.88    5.09
01003   011899  5.64    4.47    3.37    0.97    1.16    5.90    7.81    6.53    0.31    2.23    3.75    4.79
.....

Output

01001 011895 7.03 2.96 8.36 3.53 3.96 5.40 3.92 3.36 0.73 2.03 1.44 3.66
01003 011895 6.46 3.53 6.69 4.63 3.54 11.71 7.64 8.02 2.45 3.21 1.35 4.05

vgersh99 · April 24, 2019, 11:06am

sorry - my bad. try this version:

{
  for(i=3;i<=NF;i++) {
     key[$1]
     if ( $i >0 && (!(($1, i) in f1) || f1[$1,i] > $i)) {
       f1[$1,i] = $i
       if (!($1,i) in f2) f2[$1,i] = $2
     }
  }
  nf=i
}
END {
  for( k in key)
    for(i=3; i<= nf; i++) {
      printf("%s%s%s", (i==3)?k OFS f2[k,i]:"", OFS, (i==nf)?f1[k,i] ORS:f1[k,i])
    }
}

ncwxpanther · April 24, 2019, 11:15am

These appear to be the same. Did I miss something?

Scrutinizer · April 24, 2019, 11:24am

Adaptation of post #7 for the new requirements:

awk -v c=4 '
  !($1 in M) {
    M[$1]=$c
  }
  {
    if($c>-99.99 && $c<=M[$1]) {
      M[$1]=$c
      V[$1]=$2
    }
  }
  END {
    for(i in M) print i, M, V
  }
' file

c is the column number

ncwxpanther · April 24, 2019, 11:31am

Thanks Scrutinizer. The output is as desired.

vgersh99 · April 24, 2019, 11:31am

it's not the same:

{
  for(i=3;i<=NF;i++) {
     key[$1]
     if ( $i >0 && (!(($1, i) in f1) || f1[$1,i] > $i)) {
       f1[$1,i] = $i
       if (!($1,i) in f2) f2[$1,i] = $2
     }
  }
  nf=i
}
END {
  for( k in key)
    for(i=3; i<= nf; i++) {
      printf("%s%s%s", (i==3)?k OFS f2[k,i]:"", OFS, (i==nf)?f1[k,i] ORS:f1[k,i])
    }
}

given your sample file:

01001   011895  7.03    2.96    8.36    3.53    3.96    5.40    3.92    3.36    0.73    2.03    1.44    3.66
01001   011896  5.86    5.42    5.54    3.98    3.77    6.24    4.38    2.57    0.82    1.66    2.89    1.94
01001   011897  3.27    6.63    10.94   4.35    0.81    1.57    3.96    5.02    0.87    0.75    1.84    4.38
01001   011898  2.33    2.07    2.60    4.56    0.54    3.13    5.80    6.02    1.51    3.21    6.66    3.91
01001   011899  5.80    6.94    3.35    2.22    2.93    2.31    6.80    2.90    0.63    3.02    1.98    5.25
01003   011895  6.46    3.53    6.69    4.63    3.54    11.71   7.64    8.02    2.45    3.21    1.35    4.05
01003   011896  3.80    5.88    5.63    2.45    2.83    6.98    5.24    5.60    2.90    5.38    4.09    2.45
01003   011897  3.95    6.62    8.19    4.43    1.09    2.44    7.06    14.69   2.08    1.29    2.45    3.91
01003   011898  2.82    4.25    2.04    3.64    0.73    5.87    6.72    11.96   9.90    1.46    7.88    5.09
01003   011899  5.64    4.47    3.37    0.97    1.16    5.90    7.81    6.53    0.31    2.23    3.75    4.79

produces:

01001 011895 2.33 2.07 2.60 2.22 0.54 1.57 3.92 2.57 0.63 0.75 1.44 1.94
01003 011895 2.82 3.53 2.04 0.97 0.73 2.44 5.24 5.60 0.31 1.29 1.35 2.45