Hello,
I have 2 files and I want them to be compared in a specific fashion
file1:
A_1200_1250
A_1251_1300
B_1301_1350
B_1351_1400
B_1401_1450
C_1451_1500
and so on...
file2:
1210 1305 1260 1295
1400 1500 1450 1495
Now The script should look for "1200" from A_1200_1250 of file 1 and see if this number falls between column 1 and column 3 or column 2 and column 4 of file2
If it falls between column 1 and column 3 of file 2 assign it positive
If it falls between column 2 and column 4 of file2 assign it negative
It the above 2 condition's are not satisfied assign neutral
output
A_1200_1250 neutral
A_1251_1300 positive
B_1301_1350 negative
B_1351_1400 neutral
B_1401_1450 positive
C_1451_1500 neutral
Any help or suggestion on this is greatly appreciated
Thanks,
Your sample range is overlapping:
1210 1305 1260 1295
1400 1500 1450 1495
If you search for:
1261 it is in the first range 1210-1305 and also in the second 1260-1295.
Hi,
You have to look only between column 1 and 3 and column 2 and 4
since 1251 is present between column 1 and 3 (1210-1260)we assign positive.
Is file2 always just 2 lines?
No, Its of 27000 lines and file 1 is of 111000 lines
#!/usr/bin/ksh
while read mLine; do
mNbr=$(echo ${mLine} | sed 's/.*_\(.*\)_.*/\1/')
mFound='N'
while read mFrom1 mTo1 mFrom2 mTo2; do
if [[ ${mNbr} -ge ${mFrom1} && ${mNbr} -le ${mTo1} ]]; then
mFound='Y'
echo ${mLine} "positive"
break
else
if [[ ${mNbr} -ge ${mFrom2} && ${mNbr} -le ${mTo2} ]]; then
mFound='Y'
echo ${mLine} "negative"
break
fi
fi
done < Range_File
if [[ ${mFound} = 'N' ]]; then
echo ${mLine} "neutral"
fi
done < Search_File
That's a lot of reading. Try nawk with arrays:
#!/usr/bin/nawk -f
NR == FNR {
range[$1, $3] = "positive";
range[$4, $2] = "negative";
}
NR != FNR {
FS = "_"
found=0
for (comb in range) {
split(comb, key, SUBSEP)
sign = range[comb]
if (($2 >= key[1]) && ($2 <= key[2]))
{ found=1; break }
}
if (found)
printf("%s %s\n", $0, sign);
else
printf("%s neutral\n", $0);
}
Note I use 'file2' as first input to establish the ranges.
[mute@sunny ~]$ ./range.sh file2 file1
A_1200_1250 neutral
A_1251_1300 positive
B_1301_1350 negative
B_1351_1400 neutral
B_1401_1450 positive
C_1451_1500 neutral
Thank you so much.. Both are programs are running well on small set. When I am trying to run it on 111000 rows its taking more than 2 hours..its still running.
Any suggestion on how to speed up?
Thanks,
Diya
the nawk version is? hmm. could probably do it in C..