Help with calculate median, first quartile, second quartile and third quartile

Input file:

21.08
21.06
20.98
20.65
18.52
16.34
13.58
12.2
10.66
10.22
9.8
8.6
7.4
3.9
3.5

Desired output file:

8.6
12.2
20.65

I wanna to calculate 25th percentile/first quartile (Q1), 50th percentile/second quartile (Q2)/median, 75th percentile/third quartile (Q3) from the input file.
It is also acceptable to calculate Q1,Q2,Q3 separately.
Based on my understanding of quartile, 8.6 is the Q1, 12.2 is Q2, 20.65 is Q3.

The way I calculate Q1,Q2,Q3 is based on the following criteria:
sort the figure from smallest to largest
The median is the (15 + 1) � 2 = 8th value = 12.2;

I take the 8th value as the center. Split the data into lower and upper part

In lower part, the lower quartile (Q1) is the (7 + 1) � 2 = 4th value = 8.6;
In upper part, the upper quartile is the 3 (7 + 1) � 2 = 12th value = 20.65.

Thanks in advance.

You need provide the formula, how to get 8.6?

Hi rdcwayx,

I just edit my question and include the way how I calculate median, first and third quartile manually.
Thanks for your advice.

In upper part, the upper quartile is the 3 (7 + 1) � 2 = 4th value = 20.65.

should be

In upper part, the upper quartile is the 3 (7 + 1) � 2 = 12th value = 20.65.

right ?

Hi itkamaraj,
Thanks for reminding :slight_smile:
You're right.
Do you have any idea to solve this problem by using command?

you can write a shell script.

i dont know, whethere it is achievable in one line command

Try this

user@tioman> (/home/user) $ cat test.sh
#!/bin/bash


filename="tmp.txt"

sort -n $1 >$filename


rows=`wc -l $filename|cut -d' ' -f1`
q2=`echo "($rows+1)/2" |bc`

q1=`echo "$q2 / 2"|bc`

q3=`echo "3 * $q1" |bc
`
echo  "Q1=  " `head -$q1 $filename|tail -1`

echo  "Q2= "`head -$q2 $filename|tail -1`

echo  "Q3= "`head -$q3 $filename|tail -1`

user@tioman> (/home/user) $ ./test.sh test.txt
Q1=   12.2
Q2= 8.6
Q3= 20.65
user@tioman> (/home/user) $
1 Like

this is what are you looking for ?

awk 'NR%4==0' filename

Hi kumaran_5555,

Thanks for your script.
I thought it should be Q1= 8.6, Q2=12.2, Q3=20.65 instead of Q1= 12.2, Q2=8.6, Q3=20.65?
Thanks for verification.

I have just edited the script, can you check now

awk 'NR==FNR{Q1=int((NR+1)/4);Q2=int((NR+1)/2);Q3=int((NR+1)*3/4);next}
     FNR==Q1||FNR==Q2||FNR==Q3 ' infile <(sort -n infile)
1 Like