How to extract elements using Awk

Hi,

I have this typical extraction problem in AWK.

I have 3 input files..

i) First one is somehow like an oracle of:-
foo 12,23,24
bla 11,34
car 35

ii)Second file is basically detailing the score for each of the second field of first file. Besides, for the first column, it is the position or rank that these score located.

1 0.345 24
2 0.231 1
3 0.220 24
4 0.1090 12

iii) Third file contain other information of the second field of first file.
Line Poll NotPoll
10 3 1
12 1 2
23 3 1
24 2 4

Initially, i try to getline for file 1 to check with the position (1st field of second file). I will only pick the most minimum position. In this case , when it scans the first line of file1, it will check position returned by 12,23 and 24 from second file.

From this second file, it will only pick the smallest position (e.g in this case, it is 1 due to "24"). This have been done. However, now i would like to link the "24" to third file. This is to enable me to match with the first field of the 3rd file. I wanted to extract further information of POll and Not Poll (which is 2 and 4 respectively).

In my below code, I only able to print the smallest position but not the exact corresponding element. Please advise. Appreciate your help.

#!/usr/bin/awk -f 
{
repo = $1
split($2, search_vals, ",")
delete found

while (getline < (repo "/file2.txt")) {
	min=0;
	max=0;
	sum=0;
	scores[$1]=$2;

	for (k in search_vals) {
		if ($3 == search_vals[k]) {
	       		found[$3] = $1;
	       		flag=1;	
			break;
 		}
	}
}

asort(found)

pos=found[1]

print pos;

while (getline < (repo "/file2.txt"))  {
     if(pos==$1){
        lineno=$3;
	break;
     }
}

close(repo "/file2.txt")

print lineno;

if(flag){
score=scores[found[1]]

for(i in scores){
  if(scores==score)
    newscores=scores;
}
	min=found[1];	

   for(i in newscores){
      if(newscores== score)
       {
         if(int(i)<int(min))
            min=i
         if(int(i)>int(max))
            max=i
       }
  }     

	gap=int(max)-int(min)+1;
	
	for(x=int(min); x<=int(max); x++){
   		sum+=x;
   		
	}
	if(gap==1){
	   c=min;

	}
	else{
           c=int(0.5+sum/gap);
    	}
    	
   split("", scores) 
   split("", newscores) 
} 

while (getline < (repo "/3rdfile.txt")) {
  
 
  last=$1;
}

m=c/last*100;

print repo,found[1]>"test.txt"

}

In the above code, I try to implement thre above described but the line after the BOLD which is print lino. is not working. As in the bold form, I added an extra while loop to again extracting the exact number apart from the previous position they obtained. E.g( apart from returning the position, it return the corresponding 3rd field of information.) However, it doesnt return me anything.

Please advise. Appreciate alot.

Many thanks.

Could you post the desired result given your example data files?

Hi,

Supposedly in the test.txt, the desired result would print out:-

foo, 1,2,4

In the above case, we dont consider the score to have any similarities in other lines, so it only checks the minimum position. The 2,4 is basically referring to the 24 2 4. (third file).

Basically, im executing ./awkscript file1.txt.

Please advise. Thanks.

Something like this?

awk 'NR == FNR {
  if (!(item && pos)) {
    item = $NF
    pos = $1
  } 
  if ($1 < pos) {
    pos = $1
    item = $NF
    next
    }
  }
NR > FNR && $1 == item {
  poll = $2 OFS $3
  next
  }
last { 
  n = split($2, t, ",")
  while (++i <= n)
    if (t == item) {
      print $1, pos, poll
      exit
  }
}' OFS=, file2 file3 last=1 file1