Hi, I have the following problem that is beyond what I can currently do with bash scripting.
In file 1, I have ~ 2500000 values. Note this file is not sorted.
3 19 LABEL_A
3 37 LABEL_B
2 12 LABEL_C
1 15 LABEL_D
I have a list of values in "file 2" ~ 25000 unique lines:
Note - LABEL_7 AND LABEL_8 overlap slightly in their column 2 and 3 values
1 11 20 LABEL_1
1 18 30 LABEL_2
1 31 40 LABEL_3
2 11 20 LABEL_4
2 21 30 LABEL_5
2 31 40 LABEL_6
3 11 20 LABEL_7
3 15 30 LABEL_8
3 31 40 LABEL_9
4 11 20 LABEL_10
ETC
To run through what I would like to do, as an example:
LABEL_A (FILE 1) has a 3 in column 1, and a value of 19 in column 2.
I want to compare this to every line in FILE 2.
So, if there is a 3 in column 1 of FILE2, and 19 is between the values of columns 2 and 3 of FILE2, see what label this corresponds to in FILE2.
In this example, 19 is between the values in column 2 and 3 (FILE2) for LABEL_7 and LABEL_8.
Desired output: (Note the value of 2 in column 4 below means there are 2 labels that contain the value 19).
LABEL_A LABEL_7 LABEL_8 2
Full output:
LABEL_A LABEL_7 LABEL_8 2
LABEL_B LABEL_9 1
LABEL_C LABEL_4 1
LABEL_D LABEL 1 1
I think the code for this will involve while loops and arrays, but I have no idea where to start. Any bash solutions would be great (as this is what I am currently learning), but any assistance at all would be very much appreciated.