Please know that I am very new to unix and trying to learn 'on the job'. I'm only manipulating large tab-delimited files (millions of rows), but I'm stuck and don't know how to proceed with the following. Hoping for some friendly advice
I have 2 tab-delimited files - with differing column & row numbers in each.
Please note: some of the 'names' in column 1 (in each file) are repeated a varying number of times. Each of the real files are millions of rows each.
File 1:
Sc1 10 20
Sc1 20 30
Sc1 30 40
Sc2 40 50
Sc2 50 60
Sc2 60 70
Sc3 70 80
Sc3 80 90
Sc3 90 100
File 2:
Sc1 22 23 100
Sc1 44 45 50
Sc2 60 61 25
Sc3 79 80 40
Sc3 100 101 40
Sc4 60 61 25
I would like to test the following:
IF the value in the first column & first row of File2 matches the value anywhere in the first column of File1, AND IF the second column & first row value of File2 is >= to the second column File1 but < the third column File1 (these would be on the corresponding row of File1 after the match established in the first 'IF'), THEN write the value from column 4 row 1 File2 to File3, ELSE write NA to File3.
After testing for every File2 column 1 values, File3 should look like the following:
100
NA
25
40
NA
NA
To help understand, I can achieve this within Excel by doing the following
Paste File 1 from cells A1:C9
Paste File 2 from cells D1:G5
In H1 write the following formula:
=IF(COUNTIFS($A$1:$A$9,D1, $B$1:$B$9,"<="&E1,$C$1:$C$9,">"&E1)>0,G1,"NA")
Copy formula down to H5.
But � the real world files exceed the size of Excel, so transferring this to Unix is my problem.
I have tried combining the above File1 and File2 data into one file and then using the following command:
awk '{if ($4=$1 && $5>=$2 && $5<$3) {print $7}}' file3.txt > file4.txt
But that only tests on a row level.
I feel like I need to use awk with NR==FNR but ensure where to start?
I am hoping I have made that clear, and that someone can kindly provide some assistance
Many thanks
Ted (Completely new to unix and learning everyday)