Input file:
21.08
21.06
20.98
20.65
18.52
16.34
13.58
12.2
10.66
10.22
9.8
8.6
7.4
3.9
3.5
Desired output file:
8.6
12.2
20.65
I wanna to calculate 25th percentile/first quartile (Q1), 50th percentile/second quartile (Q2)/median, 75th percentile/third quartile (Q3) from the input file.
It is also acceptable to calculate Q1,Q2,Q3 separately.
Based on my understanding of quartile, 8.6 is the Q1, 12.2 is Q2, 20.65 is Q3.
The way I calculate Q1,Q2,Q3 is based on the following criteria:
sort the figure from smallest to largest
The median is the (15 + 1) � 2 = 8th value = 12.2;
I take the 8th value as the center. Split the data into lower and upper part
In lower part, the lower quartile (Q1) is the (7 + 1) � 2 = 4th value = 8.6;
In upper part, the upper quartile is the 3 (7 + 1) � 2 = 12th value = 20.65.
Thanks in advance.
You need provide the formula, how to get 8.6?
Hi rdcwayx,
I just edit my question and include the way how I calculate median, first and third quartile manually.
Thanks for your advice.
In upper part, the upper quartile is the 3 (7 + 1) � 2 = 4th value = 20.65.
should be
In upper part, the upper quartile is the 3 (7 + 1) � 2 = 12th value = 20.65.
right ?
Hi itkamaraj,
Thanks for reminding
You're right.
Do you have any idea to solve this problem by using command?
you can write a shell script.
i dont know, whethere it is achievable in one line command
Try this
user@tioman> (/home/user) $ cat test.sh
#!/bin/bash
filename="tmp.txt"
sort -n $1 >$filename
rows=`wc -l $filename|cut -d' ' -f1`
q2=`echo "($rows+1)/2" |bc`
q1=`echo "$q2 / 2"|bc`
q3=`echo "3 * $q1" |bc
`
echo "Q1= " `head -$q1 $filename|tail -1`
echo "Q2= "`head -$q2 $filename|tail -1`
echo "Q3= "`head -$q3 $filename|tail -1`
user@tioman> (/home/user) $ ./test.sh test.txt
Q1= 12.2
Q2= 8.6
Q3= 20.65
user@tioman> (/home/user) $
1 Like
this is what are you looking for ?
awk 'NR%4==0' filename
Hi kumaran_5555,
Thanks for your script.
I thought it should be Q1= 8.6, Q2=12.2, Q3=20.65 instead of Q1= 12.2, Q2=8.6, Q3=20.65?
Thanks for verification.
I have just edited the script, can you check now
awk 'NR==FNR{Q1=int((NR+1)/4);Q2=int((NR+1)/2);Q3=int((NR+1)*3/4);next}
FNR==Q1||FNR==Q2||FNR==Q3 ' infile <(sort -n infile)
1 Like