jiam912
February 21, 2020, 2:15pm
1
Objective, create a table of range of bins to count the values for each range in column 2.
The purpose is to count at beginning all values in column 2 which contends value =0 or < 0.01
Then create the range of binnes counting values in column 2 > 0.01
The range of bins need to be extended to the MAX value in my case 20, even if these values are not found in the column.
Input file
121 0.98
121 0.98
121 1.47
121 1.47
121 0.00
121 0.00
121 0.00
121 0.10
121 2
121 3
121 7
121 0
121 0
121 10
121 12
121 15
Desired output file
0 5 31.3%
0.01 - 2.00 6 37.5%
2.01 - 4.00 1 6.3%
4.01 - 6.00 0 0.0%
6.01 - 8.00 1 6.3%
8.01 - 10.00 0 0.0%
10.01 - 12.00 2 12.5%
12.01 - 14.00 0 0.0%
14.01 - 16.00 1 6.3%
16.01 - 18.00 0 0.0%
18.01 - 20.00 0 0.0%
> 20 0 0.0%
----------------------------
Total: 16
Code Use
This code works well, but dont count values with exactly 0, and dont force the range of bins to the maximun value.
awk -v"MIN=0" -v"MAX=20" '
BEGIN {delta = (delta=="")?2:delta
MXBCK = (MAX / delta) + 1
}
{bucketNr = int(($2+delta) / delta)
if (bucketNr > MXBCK) bucketNr = MXBCK
cnt[bucketNr] ++
TOT ++ $2
numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
END {
for (bucketNr=1; bucketNr<numBuckets; bucketNr++) {end = beg + delta
printf " %05.2f - %05.2f %6d %6.2f%%\n", beg, end, cnt[bucketNr], cnt[bucketNr] / TOT *100
beg = end
}
printf " > %5.2f %6d %6.2f%%\n", MAX, cnt[MXBCK], cnt[MXBCK] / TOT *100
print "----------------------------"
print " Total:", TOT
}
' file
Thanks in advance
Neo
February 22, 2020, 4:15am
3
Hi, What does this word mean (you used it many times):
What do you mean by this word, exactly?
Do you mean "columns" (in an ASCII table)?
Thanks.
1 Like
RudiC
February 22, 2020, 5:57am
4
Just run the for loop up to MXBCK
in lieu of numBuckets
(whose final value you ignore, BTW, running the loop with <
instead of <=
)
Where did your request for a MIN
value disappear that you posted in your recent thread?
1 Like
jiam912
February 22, 2020, 1:13pm
5
I get the maximun like this
NBIR=20 #NUMBER OF RANGES ON BINS
awk -v"MIN=0" -v"MAX=20" -v"NRB=$NBIR" '
BEGIN {delta = (delta=="")?1:delta
MXBCK = (MAX / delta) + 1
}
{bucketNr = int(($8+delta) / delta)
if (bucketNr > MXBCK) bucketNr = MXBCK
cnt[bucketNr] ++
TOT ++ $2
numBuckets = NRB +1
Thanks a lot
jiam912
February 25, 2020, 12:50am
6
Hi RudiC,
Kindly can you help with the desired output. still I am trying to set the min at beginning.
Please check how i modify the code to get the desired output , but still i got troubles here :
00.01 - 02.00 5 31.25%
I want to start the range of bins on 0.01 not exactly in 0.
the code I modify.
NBIR=10 #NUMBER OF RANGES ON BINS
awk -v"MIN=0" -v"MAX=20" -v"NRB=$NBIR" '
BEGIN {delta = (delta=="")?2:delta
MXBCK = (MAX / delta) + 1
MIBCK = (MIN / delta) + 1
}
{bucketNr = (($2+delta)/delta)
if (bucketNr > MXBCK) bucketNr = MXBCK
cnt[bucketNr] ++
TOT ++ $2
numBuckets = NRB +1
}
END {
printf " < %5.2f %6d %6.2f%%\n", MIN, cnt[MIBCK], cnt[MIBCK] / TOT *100
for (bucketNr=1; bucketNr<=numBuckets; bucketNr++) {end = beg + delta
printf " %05.2f - %05.2f %6d %6.2f%%\n", beg+0.01, end, cnt[bucketNr], cnt[bucketNr] / TOT *100
beg = end
}
printf " > %5.2f %6d %6.2f%%\n", MAX, cnt[MXBCK], cnt[MXBCK] / TOT *100
print "-----------------------------"
print " Total:", TOT
print dashes
}
' file
output needed
< 0 5 31.3%
0.01 - 2.00 6 37.5%
2.01 - 4.00 1 6.3%
4.01 - 6.00 0 0.0%
6.01 - 8.00 1 6.3%
8.01 - 10.00 0 0.0%
10.01 - 12.00 2 12.5%
12.01 - 14.00 0 0.0%
14.01 - 16.00 1 6.3%
16.01 - 18.00 0 0.0%
18.01 - 20.00 0 0.0%
> 20 0 0.0%
----------------------------
Total: 16
RudiC
February 25, 2020, 10:24am
7
What if a value falls into the gap between e.g. 0 and 0.01? This is not covered here. Make sure none of your values fall into those gaps. Try this
$ awk -v"MIN=0" -v"MAX=20" '
BEGIN {delta = (delta=="")?2:delta
MNBCK = (MIN / delta)
MXBCK = (MAX / delta) + 1
}
{if ($2 <= MIN) bucketNr = MNBCK
else if ($2 > MAX) bucketNr = MXBCK
else bucketNr = int(($2+delta) / delta)
cnt[bucketNr]++
TOT++
}
END {printf " <= %5.2f %6d %6.2f%%\n", MIN, cnt[MNBCK], cnt[MNBCK] / TOT *100
beg = MIN
for (bucketNr=MNBCK+1; bucketNr<MXBCK; bucketNr++) {end = beg + delta
printf " %5.2f - %5.2f %6d %6.2f%%\n", beg+.01, end, cnt[bucketNr], cnt[bucketNr] / TOT *100
beg = end
}
printf " > %5.2f %6d %6.2f%%\n", MAX, cnt[MXBCK], cnt[MXBCK] / TOT *100
print "----------------------------"
print " Total:", TOT
}
' file
It may need some polishing, and it may not be the optimal approach. YMMV.
1 Like
jiam912
February 25, 2020, 11:26pm
8
Hi RudiC,,,
Tks a lot for the update it works fine...