Hi Friends,
I have files with columns like this. This sample input below is partial.
Please check below for main file link. Each file will have only two rows.
Gene 0.4% 0.7% 1.1% 1.4% 1.8% 2.2% 2.5% 2.9% 3.3% 3.6% 4.0% 4.3% 4.7% 5.1% 5.4% 5.8% 6.2% 6.5% 6.9% 7.2% 7.6% 8.0% 8.3% 8.7% 9.1% 9.4% 9.8% 10.1% 10.5% 10.9% 11.2% 11.6% 12.0% 12.3% 12.7% 13.0% 13.4% 13.8% 14.1% 14.5% 14.9% 15.2% 15.6% 15.9% 16.3% 16.7% 17.0% 17.4% 17.8% 18.1% 18.5% 18.8% 19.2% 19.6% 19.9% 20.3% 20.7% 21.0% 21.4% 21.7% 22.1% 22.5% 22.8% 23.2% 23.6% 23.9% 24.3% 24.6% 25.0% 25.4% 25.7% 26.1% 26.4% 26.8% 27.2% 27.5% 27.9% 28.3% 28.6% 29.0% 29.3% 29.7% 30.1% 30.4% 30.8% 31.2% 31.5% 31.9% 32.2% 32.6% 33.0% 33.3% 33.7% 34.1% 34.4% 34.8% 35.1% 35.5% 35.9% 36.2% 36.6% 37.0% 37.3% 37.7% 38.0% 38.4% 38.8% 39.1% 39.5% 39.9% 40.2% 40.6% 40.9% 41.3% 41.7% 42.0% 42.4% 42.8% 43.1% 43.5% 43.8% 44.2% 44.6% 44.9% 45.3% 45.7% 46.0% 46.4% 46.7% 47.1% 47.5% 47.8% 48.2% 48.6% 48.9% 49.3% 49.6% 50.0% 50.4% 50.7% 51.1% 51.4% 51.8% 52.2% 52.5% 52.9% 53.3% 53.6% 54.0% 54.3% 54.7% 55.1% 55.4% 55.8% 56.2% 56.5% 56.9% 57.2% 57.6% 58.0% 58.3% 58.7% 59.1% 59.4% 59.8% 60.1% 60.5% 60.9% 61.2% 61.6% 62.0% 62.3% 62.7% 63.0% 63.4% 63.8% 64.1% 64.5% 64.9% 65.2% 65.6% 65.9% 66.3% 66.7% 67.0% 67.4% 67.8% 68.1% 68.5% 68.8% 69.2% 69.6% 69.9% 70.3% 70.7% 71.0% 71.4% 71.7% 72.1% 72.5% 72.8% 73.2% 73.6% 73.9% 74.3% 74.6% 75.0% 75.4% 75.7% 76.1% 76.4% 76.8% 77.2% 77.5% 77.9% 78.3% 78.6% 79.0% 79.3% 79.7% 80.1% 80.4% 80.8% 81.2% 81.5% 81.9% 82.2% 82.6% 83.0% 83.3% 83.7% 84.1% 84.4% 84.8% 85.1% 85.5% 85.9% 86.2% 86.6% 87.0% 87.3% 87.7% 88.0% 88.4% 88.8% 89.1% 89.5% 89.9% 90.2% 90.6% 90.9% 91.3% 91.7% 92.0% 92.4% 92.8% 93.1% 93.5% 93.8% 94.2% 94.6% 94.9% 95.3% 95.7% 96.0% 96.4% 96.7% 97.1% 97.5% 97.8% 98.2% 98.6% 98.9% 99.3% 99.6% 100.0% 0.4% 0.7% 1.1% 1.4% 1.8% 2.2% 2.5% 2.9% 3.3% 3.6% 4.0% 4.3% 4.7% 5.1% 5.4% 5.8% 6.2% 6.5% 6.9% 7.2% 7.6% 8.0% 8.3% 8.7% 9.1% 9.4% 9.8% 10.1% 10.5% 10.9% 11.2% 11.6% 12.0% 12.3% 12.7% 13.0% 13.4% 13.8% 14.1% 14.5% 14.9% 15.2% 15.6% 15.9% 16.3% 16.7% 17.0% 17.4% 17.8% 18.1% 18.5% 18.8% 19.2% 19.6% 19.9% 20.3% 20.7% 21.0% 21.4% 21.7% 22.1% 22.5% 22.8% 23.2% 23.6% 23.9% 24.3% 24.6% 25.0% 25.4% 25.7% 26.1% 26.4% 26.8% 27.2% 27.5% 27.9% 28.3% 28.6% 29.0% 29.3% 29.7% 30.1% 30.4% 30.8% 31.2% 31.5% 31.9% 32.2% 32.6% 33.0% 33.3% 33.7% 34.1% 34.4% 34.8% 35.1% 35.5% 35.9% 36.2% 36.6% 37.0% 37.3% 37.7% 38.0% 38.4% 38.8% 39.1% 39.5% 39.9% 40.2% 40.6% 40.9% 41.3% 41.7% 42.0% 42.4% 42.8% 43.1% 43.5% 43.8% 44.2% 44.6% 44.9% 45.3% 45.7% 46.0% 46.4% 46.7% 47.1% 47.5% 47.8% 48.2% 48.6% 48.9% 49.3% 49.6% 50.0% 50.4% 50.7% 51.1% 51.4% 51.8% 52.2% 52.5% 52.9% 53.3% 53.6% 54.0% 54.3% 54.7% 55.1% 55.4% 55.8% 56.2% 56.5% 56.9% 57.2% 57.6% 58.0% 58.3% 58.7% 59.1% 59.4% 59.8% 60.1% 60.5% 60.9% 61.2% 61.6% 62.0% 62.3% 62.7% 63.0% 63.4% 63.8% 64.1% 64.5% 64.9% 65.2% 65.6% 65.9% 66.3% 66.7% 67.0% 67.4% 67.8% 68.1% 68.5% 68.8% 69.2% 69.6% 69.9% 70.3% 70.7% 71.0% 71.4% 71.7% 72.1% 72.5% 72.8% 73.2% 73.6% 73.9% 74.3% 74.6% 75.0% 75.4% 75.7% 76.1% 76.4% 76.8% 77.2% 77.5% 77.9% 78.3% 78.6% 79.0% 79.3% 79.7% 80.1% 80.4% 80.8% 81.2% 81.5% 81.9% 82.2% 82.6% 83.0% 83.3% 83.7% 84.1% 84.4% 84.8% 85.1% 85.5% 85.9% 86.2% 86.6% 87.0% 87.3% 87.7% 88.0% 88.4% 88.8% 89.1% 89.5% 89.9% 90.2% 90.6% 90.9% 91.3% 91.7% 92.0% 92.4% 92.8% 93.1% 93.5% 93.8% 94.2% 94.6% 94.9% 95.3% 95.7% 96.0% 96.4% 96.7% 97.1% 97.5% 97.8% 98.2% 98.6% 98.9% 99.3% 99.6% 100.0%
Basically, here is what I need to be done.
a. Start from second column which is 0.4% here.
b. Go until you hit "10" in the header name. If the header name is exactly 10.0%, then include that column too. If not, only include until the column before it. In this example, since we have 10.1% (29th column), we will be including columns starting from 0.4%(second) until 9.8% which is the 28th column. If the 29th column was to be 10.0%, then it would have been included too.
c. Average the values for these respective columns in the second row (data is not presented here - please click this link for total dataset - https://goo.gl/W8jND7\). In this example, starting from 0.4%(second column) till 9.8%(28th column).
d. In the output, print first column which is "Gene", and this average value with column header being
Gene Average_10%
e. Then start from 10.1% (29th column) and check until you hit "20" in the header name. Repeat steps b through d. And print output as
Gene Average_10% Average_20%
Repeat this until you have
Gene Average_10% Average_20% Average_30% Average_40% Average_50% Average_60% Average_70% Average_80% Average_90% Average_100%
f. After you hit 100%, it means one dataset is done.
g. If you observe my column header carefully here, there is another 0.4%-100% columns after the first 100%. I will be having 13 of these 0.4%-100%s in the input file at the above link.
i. I have multiple files, the headers can be
1% 2% 3%....100%
1.5% 2.5% 3.5%....100%
It varies from file to file. But the logic of averaging(if you hit "10", "20", etc) is always the same. And the number of samples 13 is also same which means each file will have 100%s for 13 times.
P.S: A Bonus of 1000 bits will be awarded to the effectively working solution.