awk-filter record by another file

I have file1

3049
3138
4672
22631
45324
112382
121240
125470
130289
186128
193996
194002
202776
228002
253221
273523
284601
284605
641858
701851
716844

and file2

3983    4981
5843    7501
9169    11160
12222   12776
14276   15016
17390   19207
20065   20781
21922   22746
23512   24480
25457   26044
27418   30078
30656   32185
33362   33610
34289   34639
36834   37322
38330   39691
40664   42940
45072   45596
48065   48874
49576   50022
53338   55938
58650   59420
60581   62711
63709   64716
65602   65925
67187   68425
73410   74783
75569   76438
78806   79312
79687   80358
80927   82090
82426   82869
85172   86095
87726   88358

The output file should be

4672
22631
45324

I need to filter file1 if the value is in between $1 and $2 of file2
Could anyone give a help? awk maybe a good way to filter them...:stuck_out_tongue:

Please post the output required.

Does this meet your requirement?

awk 'NR==FNR{a[$1]=$2;next} {
 for(i in a)
 {
  if($0 >= i && $0 <= a)
  {
   print
   break
  }
 }
}' file2 file1
1 Like

Yes, your code meet my requirements, thanks a lot...

awk 'NR==FNR{a[i++]=$1;next}{for(x=1;x<=i;x++) if( a[x] >= $1 && a[x] <= $2) print a[x]}' file1 file2 >outfile

---------- Post updated at 12:06 PM ---------- Previous update was at 11:50 AM ----------

Are you sure with your code? These two 3049 and 3138 digits shall not be in outfile.

outfile

3049
3138
4672
22631
45324

Pretty sure that those 2 numbers don't turn up in the output.

Odd, checked your code on gawk 3 and 4 and they appear.

Try then with

gawk 'NR==FNR{a[$1]=$2;next} {
 for(i in a)
 {
  if(int($0) >= int(i) && int($0) <= int(a))
  {
   print
   break
  }
 }
}' file2 file1

Or this to force a numeric comparison...

gawk 'NR==FNR{a[$1]=$2;next} {
 for(i in a)
 {
  if(0+$0 >= 0+i && 0+$0 <= 0+a)
  {
   print
   break
  }
 }
}' file2 file1

The latter is safer as you prevent truncating floats..

Hi.

I agree with sdf, the extra 2 lines appear. The alternate statements:

if($0 >= i+0 && $0 <= a) # SUCCEEDS!
if($0 >= int(i) && $0 <= a) # SUCCEEDS!

will both work. Of the two, I think the i+0 is a bit tricky, especially for people who don't know awk well, so I would choose int(i) and add a comment why it is required. Or perhaps add the int() to all 4 as elixir_sinari wrote and omit any confusing explanation. Good point about floats, however. I suppose one
could precondition all the data to truncate to integers, especially in this case ... cheers, drl

The numeric comparisons work!