I have the following file
299899 chrX_299716_300082 196 78.2903 299991 chrX_299982_300000 18.2538 Tajd:0.745591 FayWu:-0.245701 T2:1.45
299899 chrX_299716_300082 196 78.2903 299991 chrX_299982_300000 18.2538 Tajd:0.745591 FayWu:-0.245701 T2:0.283
311027 chrX_310892_311162 300 91.6452 311022 chrX_311013_311031 14.9526 Tajd:0.640409 FayWu:-0.278087 T2:0.283
311027 chrX_310892_311162 300 91.6452 311022 chrX_311013_311031 14.9526 Tajd:0.640409 FayWu:-0.278087 T2:-0.324
388608 chrX_388393_388823 562 50.619 388603 chrX_388594_388612 18.4584 Tajd:0.342217 FayWu:-0.742664 T2:-0.421
688781 chrX_688561_689002 552 -0 688817 chrX_688808_688826 10.6874 Tajd:0.302043 FayWu:-1.079566 T2:0.803
688781 chrX_688561_689002 552 -0 688817 chrX_688808_688826 10.6874 Tajd:0.302043 FayWu:-1.079566 T2:-1.233
1220600 chrX_1220404_1220797 510 -0 1220617 chrX_1220608_1220626 16.7085 Tajd:0.391032 FayWu:-0.421912 T2:1.093
There are a lot of identical lines which differ only in the last field (T2:#). I'm looking for a way to combine these lines so that the T2 entry is averaged. In this excerpt I would wish to receive something like:
299899 chrX_299716_300082 196 78.2903 299991 chrX_299982_300000 18.2538 Tajd:0.745591 FayWu:-0.245701 T2:0.8665
311027 chrX_310892_311162 300 91.6452 311022 chrX_311013_311031 14.9526 Tajd:0.640409 FayWu:-0.278087 T2:-0.0205
388608 chrX_388393_388823 562 50.619 388603 chrX_388594_388612 18.4584 Tajd:0.342217 FayWu:-0.742664 T2:-0.421
688781 chrX_688561_689002 552 -0 688817 chrX_688808_688826 10.6874 Tajd:0.302043 FayWu:-1.079566 T2:-0.215
1220600 chrX_1220404_1220797 510 -0 1220617 chrX_1220608_1220626 16.7085 Tajd:0.391032 FayWu:-0.421912 T2:1.093
The file is sorted, so all identical lines will be consecutive entries. The closest I have gotten is:
more input.file | awk '{split($10,a,":");avt2[$1]+=a[2];c[$1]++}END{for(i in avt2) print $0,avt2/c}' > output.file
but have not received any helpful results.
Thanks a lot for any help,
Jonas