Split a file in more files based on score content

Dear All,
I have the following file tabulated:

ID	distanceTSS	score
8434	571269	10
10122	393912	9
7652	6	10
4863	1451	9
8419	39	2
9363	564	21
9333	7714	22
9638	8334	9
1638	1231	11
10701	918	1000
6587	32056	111

What I would like to do is the following, create 100 new files based on content of the second column,
The first file should contain all the lines with a distance between 0 and 1000,
the second between 1000 and 2000, and so on untile 99000 and 10000,

Finally for each new file I would like to calculate the median of the third column(score)

Is there a rapid way to do so? I tried witha perl script but it seems really slow,

thanks for your help,
Paolo

Could this help you ?

 awk '{ print > int($2/1000)+1"_distance.txt"}'  filename
1 Like

Thanks,
That's really help me!
Another issue concerning the output file:
let's imagine I have the following output for 1_distance:

ID	distanceTSS	score
8434	8	10
8434	999	10
7652	6	9
7652	451	9

As you can notice there are ID present more than once, I would like to retain only one ID, for example:

ID	distanceTSS	score
8434	8	10
7652	6	9

And finally for each new file created (distance_*) I would like to calculate the median of the third column,

Thanks again,
Paolo

Which of the duplicate values do you want to retain? The first? The smallest?
Does the output order matter, esp. reg. median calculation?

Hi,
Is the same because they have the same score

Hi Pravin,

Is it possible for you to explain the awk commadn which u have provided...