biomed
June 20, 2012, 4:55am
1
I have file1
3049
3138
4672
22631
45324
112382
121240
125470
130289
186128
193996
194002
202776
228002
253221
273523
284601
284605
641858
701851
716844
and file2
3983 4981
5843 7501
9169 11160
12222 12776
14276 15016
17390 19207
20065 20781
21922 22746
23512 24480
25457 26044
27418 30078
30656 32185
33362 33610
34289 34639
36834 37322
38330 39691
40664 42940
45072 45596
48065 48874
49576 50022
53338 55938
58650 59420
60581 62711
63709 64716
65602 65925
67187 68425
73410 74783
75569 76438
78806 79312
79687 80358
80927 82090
82426 82869
85172 86095
87726 88358
The output file should be
4672
22631
45324
I need to filter file1 if the value is in between $1 and $2 of file2
Could anyone give a help? awk maybe a good way to filter them...
Please post the output required.
Does this meet your requirement?
awk 'NR==FNR{a[$1]=$2;next} {
for(i in a)
{
if($0 >= i && $0 <= a)
{
print
break
}
}
}' file2 file1
1 Like
biomed
June 20, 2012, 5:41am
3
Yes, your code meet my requirements, thanks a lot...
sdf
June 20, 2012, 6:06am
4
awk 'NR==FNR{a[i++]=$1;next}{for(x=1;x<=i;x++) if( a[x] >= $1 && a[x] <= $2) print a[x]}' file1 file2 >outfile
---------- Post updated at 12:06 PM ---------- Previous update was at 11:50 AM ----------
elixir_sinari:
Please post the output required.
Does this meet your requirement?
awk 'NR==FNR{a[$1]=$2;next} {
for(i in a)
{
if($0 >= i && $0 <= a)
{
print
break
}
}
}' file2 file1
Are you sure with your code? These two 3049 and 3138 digits shall not be in outfile.
outfile
3049
3138
4672
22631
45324
Pretty sure that those 2 numbers don't turn up in the output.
sdf
June 20, 2012, 6:15am
6
Odd, checked your code on gawk 3 and 4 and they appear.
Try then with
gawk 'NR==FNR{a[$1]=$2;next} {
for(i in a)
{
if(int($0) >= int(i) && int($0) <= int(a))
{
print
break
}
}
}' file2 file1
Or this to force a numeric comparison...
gawk 'NR==FNR{a[$1]=$2;next} {
for(i in a)
{
if(0+$0 >= 0+i && 0+$0 <= 0+a)
{
print
break
}
}
}' file2 file1
The latter is safer as you prevent truncating floats..
drl
June 20, 2012, 6:50am
8
Hi.
I agree with sdf, the extra 2 lines appear. The alternate statements:
if($0 >= i+0 && $0 <= a) # SUCCEEDS!
if($0 >= int(i) && $0 <= a) # SUCCEEDS!
will both work. Of the two, I think the i+0
is a bit tricky, especially for people who don't know awk well, so I would choose int(i)
and add a comment why it is required. Or perhaps add the int()
to all 4 as elixir_sinari wrote and omit any confusing explanation. Good point about floats, however. I suppose one
could precondition all the data to truncate to integers, especially in this case ... cheers, drl
sdf
June 20, 2012, 6:57am
9
The numeric comparisons work!