Here is my data structure.
# id1 id2 len start end
# 9 16792 5475 4181 4232
# 11 16792 2317 1086 1137
# 11 32879 2317 8 60
# 11 32858 2317 10 52
# 11 30670 2317 17 63
# 14 12645 532 3 67
# 14 12645 532 158 222
# 14 11879 532 3 223
# 18 23847 644 64 285
# 18 30160 644 98 285
# 18 30160 644 345 477
# 18 30160 644 516 644
I want to get the coverage of id1 based on its length (column len) considering all entries start and end values. The problem is that the multiple entries can have juxtapose values so considering the values in all entries would overrate the coverage. Also considering the smallest start value and biggest end value doesn't account for all since it can have gaps where not all length is represented.
My expected result should be something like this
9 --- 50 / 5475 = 0.009
11 --- 106 / 2317 = 0.046
14 --- 220 / 532 = 0.414
18 --- 481 / 644 = 0.75