Generate points for cdf

Hi,

I am trying to make a cdf curve from the following data.

34.3
19.2
20.7
28.3
32.5
21
37.6

The output should be something like this:

0         0
19.2    0.14
20.7    0.28
21       0.42
28.3    0.56
32.5    0.70
34.3    0.84
37.6    0.98

I plan to feed this points to gnuplot to plot it as a graph. I tried plotting the same data with smooth cumulative function in gnuplot, which seems to not return the exact graph that I want.

Your help is much appreciated.

Thanks.

 $ sort file | awk 'BEGIN{print 0"\t"0;a=0.14}{ printf "%s\t%s\n", $1,a;a+=0.14}'

0       0
19.2    0.14
20.7    0.28
21      0.42
28.3    0.56
32.5    0.7
34.3    0.84
37.6    0.98

Hi,

Thanks for the response. I ended up writing an entire script from scratch to do this dynamically. I think that the code that you have posted needs input for the 1/number_of_entries value each time.

Here is my solution..

#!/bin/bash

count=$(cat sorted1.txt | wc -l)

var=$(echo "1/$count" | bc -l)

holding=$(echo "1/$count" | bc -l)

echo 0  > cdf1.txt
echo 0  >> cdf1.txt
#echo $var

for i in $(eval echo {1..$count})
do
	echo $var >> cdf1.txt
	var=$(echo "$var+$holding" | bc -l)
done

You might want to consider this alternative:

sort -n file | awk '
{	d[++c] = $0
}
END {	inc = 1 / c
	for(i = 0; i <= c; i++)
		printf("%.1f\t%.2f\n", d, i * inc)
}'

which produces the output:

0.0	0.00
19.2	0.14
20.7	0.29
21.0	0.43
28.3	0.57
32.5	0.71
34.3	0.86
37.6	1.00

from your sample input.

I know this isn't the output you said you wanted, but it seems to be better suited to what you're trying to do.

1 Like