Hi dear users,
I need to compare numeric columns in two files. These files have the following structure.
K.txt (4 columns)
A001 chr21 9805831 9846011
A002 chr21 9806202 9846263
A003 chr21 9887188 9988593
A003 chr21 9887188 9988593
A004 chr21 9895249 9988593
......
......
K.txt file's columns 3 and 4 are the starting and ending positions of an interval for each gene name in column 1.
S.txt (4 columns)
chr21 9411326 9411327 rs75025155
chr21 9411409 9411410 rs71235072
chr21 9805830 9805831 rs78200054
chr21 9887190 9887191 rs71235073
chr21 9895220 9895221 rs78302045
chr21 9988593 9988594 rs71220654
......
......
S.txt file's columns 2 and 3 are also intervals (but shorter than K.txt). Also S.txt file is larger than K.txt
These are the possible outcomes, (or intersections among the intervals):
S$3 <= K$3 (don't print to output)
S$2 <= K$3 AND S$3 >= K$3 (print to output)
S$2 >= K$3 AND S$3 <= K$4 (print to output)
S$2 <= K$4 AND S$3 >= K$4 (print to output)
S$2 >= K$4 (don't print to output)
output should have 2 columns (tab separated): first is column 4 from S.txt (S$4) and second is column 1 from K.txt (K$1). If there are multiple matches like in the example, they should be separated by commas.
rs71235073 A003
rs78200054 A001,B001
rs78302045 A004
.....
.....
Any suggestion will be very welcome.
Thank you!