For each line (#NR= 1 to 10) I would like to search X($1),Y($2) within the I/P file which lies within a radius of 50 and then average out all the $3 within that circle of radius 50.
Problem ??
My script below does the job but it takes ages to go through the input file which is TOO large.
set first = 1 # NR=1 ; first line fo the input file
set last = 10 # NR = 10; Last line of the input file
set num = ${first}
while (${num} <= ${last})
set X0 = `cat S2 | awk -v line=${num} 'NR==line' | awk '{print $1}'` # $1,X, for each Record
set Y0 = `cat S2 | awk -v line=${num} 'NR==line' | awk '{print $2}'` # $2,Y, for each Record
set XS = `cat S2 | awk -v line=${num} 'NR==line' | awk '{print $3}'` # $3,Value that will be averaged out, for each Record
set R0 = `echo "50"` # Search radius
set AVG = `cat S2 | awk -v X=${X0} -v Y=${Y0} -v R=${R0} '{print $1,$2,$3,sqrt(($1-X)*($1-X) + ($2-Y)*($2-Y)) <=R}' | awk '{if($4==1){print $0}}' | awk '{ sum+=$3} END {print sum/NR}'`# Search all the points within the input file which lies within R #Average all the $3,values#
echo "${X0} ${Y0} ${XS} ${AVG}" >> TMP #For each record print a new line with existing $1,$2,$3 and $AVG
@ num++ # end of loop and it goes to next NR
end
Without understanding what exactly you are doing to that file, I can see that you are creating 15 processes to run 15 commands on every single line, which will have a serious effect if done for many lines / a large file. On the other hand, for 10 lines only that should not be noticed at all.
I'd say all of the above could be done in one single ( awk ?) script, accelerating the entire processing considerably.
Aside: the XS value will always be 0 as $4 does not exist in the input file (assuming "S2" IS the input file).
The catch was fantastic. You are right considering the input file that should be
$3
and not
$4
.
Let me explain what I am trying to do......
For NR==1,
I am trying to look through or search through the entire input file and find X ($1, NR >1 to NR = last record) and Y ($2, NR>1 to NR=last record) which lies within a radius of 50meter from the X0 ($1 for NR=1) and Y0($1 for NR=1). If 'ANY' found , print "1" and then add $3 for all those points and divide by the numbers of the point found inside the circle of radius R=50.
For NR==2 ------
Repeating the same process above.
For NR== 3------etc etc till last line of the record.
So, basically searching the points which lies within a radius of 50 from each points within that file and then averaging out the
$3
with number of the points found inside that circle.
So - is S2 the input file? How many lines? Will the calculations be done for every line in the file (lets call that ALLINES) or just for the first num lines? Yielding ALLINES * ALLINES result lines as opposed to num * ALLINES result lines? Will all the results go to one single output file?
I'm not sure I interpreted your requirements correctly. Your sample file doesn't have any line's coordinates within radius 50 from any other, so any test is impossible for that set (it did some avaraging for R = 1500).
Try