jiam912
February 13, 2020, 3:13pm
1
I would like to create bins to get histogram with totals and percentage, e.g. starting from 0.
If possible to set the minimum and maximum value in the bins ( in my case value min=0 and max=20 )
Input file
8 5
10 1
11 4
12 4
12 4
13 5
16 7
18 9
16 9
17 7
18 5
19 5
20 1
21 7
output desired
0 0 0.0%
0 - 2 0 0.0%
2 - 4 0 0.0%
4 - 6 0 0.0%
6 - 8 0 0.0%
8 - 10 5 6.8%
10 - 12 5 6.8%
12 - 14 13 17.8%
14 - 16 0 0.0%
16 - 18 23 31.5%
18 - 20 19 26.0%
> 20 8 11.0%
---------------------
Total: 73
I use this code, it works perfectly but the percentage is missed.
awk 'BEGIN { delta = (delta == "" ? 2 : delta) }
{
bucketNr = int(($0+delta) / delta)
cnt[bucketNr]++
numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
END {
for (bucketNr=1; bucketNr<=numBuckets; bucketNr++) {
end = beg + delta
printf "%0.1f %0.1f %d\n", beg, end, cnt[bucketNr]
beg = end
}
}' file
Thanks in advance
how's the percentage supposed to be calculated?
percentage of the bin out of the total sum?
then 8-10 5 8.2
is supposed to be 8-10 5 3.5
for your example...
Please elaborate.
RudiC
February 13, 2020, 4:14pm
3
How far would this small adaption to your code get you:
awk '
BEGIN {delta = (delta=="")?2:delta
}
{bucketNr = int(($1+delta) / delta)
cnt[bucketNr] += $2
TOT += $2
numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
END {for (bucketNr=1; bucketNr<=numBuckets; bucketNr++) {end = beg + delta
printf "%2.0f - %2.0f\t%d\t%4.1f%%\n", beg, end, cnt[bucketNr], cnt[bucketNr] / TOT *100
beg = end
}
}
' file
0 - 2 0 0.0%
2 - 4 0 0.0%
4 - 6 0 0.0%
6 - 8 0 0.0%
8 - 10 5 6.8%
10 - 12 5 6.8%
12 - 14 13 17.8%
14 - 16 0 0.0%
16 - 18 23 31.5%
18 - 20 19 26.0%
20 - 22 8 11.0%
1 Like
jiam912
February 13, 2020, 4:16pm
4
Dear vgersh99 , the percentage is calculated based in the total sum .
RudiC
February 13, 2020, 4:27pm
5
For your "max bin" question, try
awk -v"MAX=20" '
BEGIN {delta = (delta=="")?2:delta
MXBCK = (MAX / delta) + 1
}
{bucketNr = int(($1+delta) / delta)
if (bucketNr > MXBCK) bucketNr = MXBCK
cnt[bucketNr] += $2
TOT += $2
numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
END {for (bucketNr=1; bucketNr<numBuckets; bucketNr++) {end = beg + delta
printf "%2.0f - %2.0f\t%d\t%4.1f%%\n", beg, end, cnt[bucketNr], cnt[bucketNr] / TOT *100
beg = end
}
printf " > %2.0f\t%d\t%4.1f%%\n", MAX, cnt[MXBCK], cnt[MXBCK] / TOT *100
print "---------------------"
print "Total:", TOT
}
' file
0 - 2 0 0.0%
2 - 4 0 0.0%
4 - 6 0 0.0%
6 - 8 0 0.0%
8 - 10 5 6.8%
10 - 12 5 6.8%
12 - 14 13 17.8%
14 - 16 0 0.0%
16 - 18 23 31.5%
18 - 20 19 26.0%
> 20 8 11.0%
---------------------
Total: 73
1 Like
rdrtx1
February 13, 2020, 4:32pm
6
same as above but total included:
awk 'BEGIN { delta = (delta == "" ? 2 : delta) ; max=20 }
{
if ($1 > max) {
maxf++;
maxc+=$2;
} else {
bucketNr = int(($1+delta) / delta)
cnt[bucketNr]++
cntc[bucketNr]+=$2;
numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
total+=$2;
}
END {
for (bucketNr=1; bucketNr<=numBuckets; bucketNr++) {
end = beg + delta
printf "%2d-%2d %3d %.2f%\n", beg, end, cntc[bucketNr], (cntc[bucketNr]*100.0)/total;
beg = end
}
if (maxf) printf ">%2d %2d %.2f%\n", max, maxc, (maxc*100.0)/total;
print "-------------";
print "Total: " total;
}' file
The code posted above needs some work. Lucky that $0+delta
works somehow. The code posted does not output the columns desired shown, etc.
1 Like
RudiC/rdrtx1,
the binning, the percentage and the totals are somewhat different from the OP's desired output in post #1 .
Hence post #2 with the ask to elaborate.
2 Likes
jiam912
February 13, 2020, 8:35pm
8
Dear RudiC and rdrtx1
Codes are amazing, thanks a lot
Please a small request more.
It is possible to set the minimum too. As example minimum = 0.
The the results will show something like this.
0 0 0.0%
0 - 2 0 0.0%
2 - 4 0 0.0%
4 - 6 0 0.0%
6 - 8 0 0.0%
8 - 10 5 6.8%
10 - 12 5 6.8%
12 - 14 13 17.8%
14 - 16 0 0.0%
16 - 18 23 31.5%
18 - 20 19 26.0%
> 20 8 11.0%
--------------------
Total: 73
jiam912
February 13, 2020, 11:51pm
9
Dear rdrtx1 , I have updated my post with correct values and requirements.. Thanks a lot for the code.
--- Post updated at 12:51 AM ---
Dear vgersh99, I have updated my output file as it was wrong values. sorry for the confusion..
RudiC
February 14, 2020, 3:49am
10
I think your small request more can easily be satisfied along the lines shown by rdrtx1 and me...
1 Like
jiam912
February 14, 2020, 11:35am
11
Dear RudiC.. Thanks a lot code is great.. I will post a new one similar