I am using the following script to search for and print minimum values for each individual Fields (3-14) for each unique id (Field 1). But when the field contains a "-99.99" ( I am ignoring "-99.99") and when the minimum value is the first line of a new id (Field 1), the output does not print Field 2, it leave a blank space. Any ideas on how to correct this?
The below snipit is looking for the min value in field 12 only.
awk ' /-99.99/ {next} {if (a[$1] == "") a[$1] = $12; if (a[$1] > $12) {a[$1]=$12" " $2} } END { for (i in a) { print i, a} }'
Input file sample
01001 012017 10.64 3.96 4.45 3.48 9.58 10.76 4.44 5.03 2.78 1.45 1.26 3.24
01001 012018 3.94 6.18 4.47 4.79 11.07 3.01 4.27 6.37 6.48 4.52 6.16 9.95
01001 012019 6.77 3.79 3.96 -99.99 -99.99 -99.99 -99.99 -99.99 -99.99 -99.99 -99.99 -99.99
01003 011895 6.46 3.53 6.69 4.63 3.54 11.71 7.64 8.02 2.45 3.21 1.35 4.05
01003 011896 3.80 5.88 5.63 2.45 2.83 6.98 5.24 5.60 2.90 5.38 4.09 2.45
Current Output
01001 1.45
01003 3.21 011895
Expected Output
01001 1.45 012017
01003 3.21 011895
a bit verbose, but..... sometyhing along these lines.
awk -f ncw.awk myFile
where ncw.awk
is:
{
for(i=3;i<=NF;i++) {
key[$1]
if ( $i >0 && (!(($1, i) in f1) || f1[key[$1],i] > $i)) {
f1[$1,i] = $i
if (!($1,i) in f2) f2[$1,i] = $2
}
}
nf=i
}
END {
for( k in key)
for(i=3; i<= nf; i++) {
printf("%s%s%s", (i==3)?k OFS f2[k,i]:"", OFS, (i==nf)?f1[k,i] ORS:f1[k,i])
}
}
results in (given your sample input file):
01001 012017 10.64 3.96 4.45 3.48 9.58 10.76 4.44 5.03 2.78 1.45 1.26 3.24
01003 011895 6.46 3.53 6.69 4.63 3.54 11.71 7.64 8.02 2.45 3.21 1.35 4.05
2 Likes
I get other data
awk '
!($1 in t) {t[$1] = $3}
!/-99.99/ {for(i = 3; i <= NF; i++) if (t[$1] > $i) {
t[$1] = $i; k[$1] = $2}
}
END { for (i in t) print i, t, k
}' file
--- Post updated at 19:46 ---
01001 1.26 012017
01003 1.35 011895
--- Post updated at 19:52 ---
fix
awk '
!/-99.99/ {if (! t[$1]) t[$1] = $3
for(i = 3; i <= NF; i++) if (t[$1] > $i) {
t[$1] = $i; k[$1] = $2}
}
END { for (i in t) print i, t, k
}' file
--- Post updated at 19:59 ---
Not until the end understood the task
1 Like
Hello ncwxpanther,
Could you please try following too, it considers that your Input_file is sorted by 1st field(as per shown samples),if not then we could add like sort -k1 Input_file | awk ....
too before my following code.
awk '
!prev{
prev_first=$1
prev_sec=$2
}
prev!=$1 && prev{
print prev_first,min,prev_sec
prev_first=$1
prev_sec=$2
min=""
}
{
for(i=3;i<=NF;i++){
if($i!=-99.99){
min=min<$i?min?min:$i:$i
}
}
prev=$1
}
END{
if(min){
print prev_first,min,prev_sec
}
}' Input_file
Output will be as follows.
01001 1.26 012017
01003 1.35 011895
Thanks,
R. Singh
fix
awk '
!/-99.99/ {if (! t[$1]) t[$1] = $3
for(i = 3; i <= NF; i++) if (t[$1] > $i) {
t[$1] = $i; k[$1] = $2}
}
END { for (i in t) print i, t, k
}' file
--- Post updated at 19:59 ---
Not until the end understood the task
[/quote]
Thanks nezabudka
Is the field of interest determined by counting from the last field in the file?
for(i = 3; i <= NF; i++)
So would this be the 3rd field from the end? The output seems correct, but I am putting in variables for each field from the 3rd to the 14th. So in the below snipit im considering 10.64 as the 3rd field and 3.24 as the 14th field.
01001 012017 10.64 3.96 4.45 3.48 9.58 10.76 4.44 5.03 2.78 1.45 1.26 3.24
In other words, what would be the correct syntax for find the min value of the 3rd field?
ncwxpanther ,
in my post #2
, the suggested implementation calculates ALL the min values for ALL the fields starting at 3.
Try:
awk '
!($1 in M) {
M[$1]=$3
}
{
for(i=3; i<=NF; i++)
if($i>-99.99 && $i<=M[$1]) {
M[$1]=$i
V[$1]=$2
}
}
END {
for(i in M) print i, M, V
}
' file
01001 1.26 012017
01003 1.35 011895
I see. To be more specific I need the script to look at each field separately for each unique ID. So the min value for just Field 3 or just Field 4.
vgersh99:
a bit verbose, but..... sometyhing along these lines.
awk -f ncw.awk myFile
where ncw.awk
is:
{
for(i=3;i<=NF;i++) {
key[$1]
if ( $i >0 && (!(($1, i) in f1) || f1[key[$1],i] > $i)) {
f1[$1,i] = $i
if (!($1,i) in f2) f2[$1,i] = $2
}
}
nf=i
}
END {
for( k in key)
for(i=3; i<= nf; i++) {
printf("%s%s%s", (i==3)?k OFS f2[k,i]:"", OFS, (i==nf)?f1[k,i] ORS:f1[k,i])
}
}
results in (given your sample input file):
01001 012017 10.64 3.96 4.45 3.48 9.58 10.76 4.44 5.03 2.78 1.45 1.26 3.24
01003 011895 6.46 3.53 6.69 4.63 3.54 11.71 7.64 8.02 2.45 3.21 1.35 4.05
Thanks for the help, but this prints out the first line for each unique ID. Not necessarily the min value.
Input
01001 011895 7.03 2.96 8.36 3.53 3.96 5.40 3.92 3.36 0.73 2.03 1.44 3.66
01001 011896 5.86 5.42 5.54 3.98 3.77 6.24 4.38 2.57 0.82 1.66 2.89 1.94
01001 011897 3.27 6.63 10.94 4.35 0.81 1.57 3.96 5.02 0.87 0.75 1.84 4.38
01001 011898 2.33 2.07 2.60 4.56 0.54 3.13 5.80 6.02 1.51 3.21 6.66 3.91
01001 011899 5.80 6.94 3.35 2.22 2.93 2.31 6.80 2.90 0.63 3.02 1.98 5.25
.....
01003 011895 6.46 3.53 6.69 4.63 3.54 11.71 7.64 8.02 2.45 3.21 1.35 4.05
01003 011896 3.80 5.88 5.63 2.45 2.83 6.98 5.24 5.60 2.90 5.38 4.09 2.45
01003 011897 3.95 6.62 8.19 4.43 1.09 2.44 7.06 14.69 2.08 1.29 2.45 3.91
01003 011898 2.82 4.25 2.04 3.64 0.73 5.87 6.72 11.96 9.90 1.46 7.88 5.09
01003 011899 5.64 4.47 3.37 0.97 1.16 5.90 7.81 6.53 0.31 2.23 3.75 4.79
.....
Output
01001 011895 7.03 2.96 8.36 3.53 3.96 5.40 3.92 3.36 0.73 2.03 1.44 3.66
01003 011895 6.46 3.53 6.69 4.63 3.54 11.71 7.64 8.02 2.45 3.21 1.35 4.05
sorry - my bad. try this version:
{
for(i=3;i<=NF;i++) {
key[$1]
if ( $i >0 && (!(($1, i) in f1) || f1[$1,i] > $i)) {
f1[$1,i] = $i
if (!($1,i) in f2) f2[$1,i] = $2
}
}
nf=i
}
END {
for( k in key)
for(i=3; i<= nf; i++) {
printf("%s%s%s", (i==3)?k OFS f2[k,i]:"", OFS, (i==nf)?f1[k,i] ORS:f1[k,i])
}
}
These appear to be the same. Did I miss something?
Adaptation of post #7 for the new requirements:
awk -v c=4 '
!($1 in M) {
M[$1]=$c
}
{
if($c>-99.99 && $c<=M[$1]) {
M[$1]=$c
V[$1]=$2
}
}
END {
for(i in M) print i, M, V
}
' file
c is the column number
1 Like
Thanks Scrutinizer. The output is as desired.
it's not the same:
{
for(i=3;i<=NF;i++) {
key[$1]
if ( $i >0 && (!(($1, i) in f1) || f1[$1,i] > $i)) {
f1[$1,i] = $i
if (!($1,i) in f2) f2[$1,i] = $2
}
}
nf=i
}
END {
for( k in key)
for(i=3; i<= nf; i++) {
printf("%s%s%s", (i==3)?k OFS f2[k,i]:"", OFS, (i==nf)?f1[k,i] ORS:f1[k,i])
}
}
given your sample file:
01001 011895 7.03 2.96 8.36 3.53 3.96 5.40 3.92 3.36 0.73 2.03 1.44 3.66
01001 011896 5.86 5.42 5.54 3.98 3.77 6.24 4.38 2.57 0.82 1.66 2.89 1.94
01001 011897 3.27 6.63 10.94 4.35 0.81 1.57 3.96 5.02 0.87 0.75 1.84 4.38
01001 011898 2.33 2.07 2.60 4.56 0.54 3.13 5.80 6.02 1.51 3.21 6.66 3.91
01001 011899 5.80 6.94 3.35 2.22 2.93 2.31 6.80 2.90 0.63 3.02 1.98 5.25
01003 011895 6.46 3.53 6.69 4.63 3.54 11.71 7.64 8.02 2.45 3.21 1.35 4.05
01003 011896 3.80 5.88 5.63 2.45 2.83 6.98 5.24 5.60 2.90 5.38 4.09 2.45
01003 011897 3.95 6.62 8.19 4.43 1.09 2.44 7.06 14.69 2.08 1.29 2.45 3.91
01003 011898 2.82 4.25 2.04 3.64 0.73 5.87 6.72 11.96 9.90 1.46 7.88 5.09
01003 011899 5.64 4.47 3.37 0.97 1.16 5.90 7.81 6.53 0.31 2.23 3.75 4.79
produces:
01001 011895 2.33 2.07 2.60 2.22 0.54 1.57 3.92 2.57 0.63 0.75 1.44 1.94
01003 011895 2.82 3.53 2.04 0.97 0.73 2.44 5.24 5.60 0.31 1.29 1.35 2.45