Hi
I have two files. One is a text file consisting of sentences i.e. INPUT.txt and the second file is SEARCH.txt consisting of two or three columns. I need help to write a script to search the second column of SEARCH.txt for each set of five words (blue color as set one and green color as set two and red color as set three and so on) of each sentence from the INPUT.txt file. The search condition is to find one set of five words from the second column of SEARCH.txt which match atleast four words from the set of five words from the input sentence and return that set of five words from SEARCH.txt whose corresponding value on the first column is the smallest. [e.g. assumming -2.922845 is bigger than -2.927181]. The search is to be carried out for each set of five words. If there is less than five words in the sentence, the search must stop. Assuming that the columns of SEARCH.txt are separated by tab.
Format of INPUT.txt file.
hai wafam cherol makha palli adubu madu ma yaakhidre haikhre tamlakle .
mahak aroiba yaahip tankhi hai machagi matamda saramba gatetu kaikhere mahakkisu aroiba yaahip tankhi hai haikhre .
Format of SEARCH.txt file.
-0.9725326 arna thamlamba nongchup santhong gani -0.014587925
-0.9777407 tainaba amanba yamna uningdraba -0.014587925
-0.9700631 aeroplane adu indira parktara ama -0.014587925
-1.2438936 mahakki aroiba yaahip tankhi hai -0.014587925
-0.97742474 aroiba yaahip tankhi hai hairi -0.014587925
-1.391722 hai wafam cherolna makha palli -0.6328273
-2.922845 hai wafam cherolduna makha palli -0.1190167
-2.915667 hai wafam cherolsina makha palli -0.5702463
-2.927181 hai wafam paochena makha palli -0.1963889
-2.925497 hai wafam khangnaduna -0.6328273
-2.855543 hai wafam ngasigi
-2.926619 hai wafam thamkharabani
-1.635051 hai wafam thamlamle -0.4567362
-1.078001 hai wafam thamlamli -0.8960688
-1.023442 adubu madu makhada yaakhidre haikhre -0.1234433
-1.432234 adubu madu makha yaakhidre haikhre -0.5432345
-1.1278934 changangei air fieldda hongdok pikhraga -0.014587925
-0.9567379 nupa machagi matamda saramba gatetu -0.014587925
-0.5984392 machagi matamda saramba gatetu kaire -0.014587925
-1.250842 leiriba aduda santri khara thamkhre -0.014587925
The expected format of OUTPUT.txt is given below.
hai wafam paochena makha palli adubu madu makha yaakhidre haikhre tamlakle.
mahakki aroiba yaahip tankhi hai nupa machagi matamda saramba gatetu mahakki aroiba yaahip tankhi hai haikhre
Thanks in advance :).