Shell or awk script to compute average of all the points within a circle

HI Help,

I have a file which looks like below ---

Input file --->

1970113.00000 3460.00000 1.09516 
1970116.00000 3791.00000 1.06350 
1970120.00000 4120.00000 1.07588 
1970115.00000 4450.00000 1.09591 
1970116.00000 4780.00000 1.09965 
1970120.00000 5109.00000 1.06733 
1970122.00000 5440.00000 1.03760 
1970124.00000 5770.00000 1.02025 
1970123.00000 6100.00000 1.00998 
1970120.00000 6430.00000 0.96426

What I want to do ? ----

For each line (#NR= 1 to 10) I would like to search X($1),Y($2) within the I/P file  which lies within a radius of 50 and then average out all the $3 within that circle of radius 50.

Problem ??

My script below does the job but it takes ages to go through the input file which is TOO large.

 
set first = 1    # NR=1 ; first line fo the input file
set last = 10   # NR = 10; Last line of the input file
 
set num = ${first}
while (${num} <= ${last})
 
set X0 = `cat S2 | awk -v line=${num} 'NR==line' | awk '{print $1}'` # $1,X, for each Record
set Y0 = `cat S2 | awk -v line=${num} 'NR==line' | awk '{print $2}'` # $2,Y, for each Record
set XS = `cat S2 | awk -v line=${num} 'NR==line' | awk '{print $3}'` # $3,Value that will be averaged out, for each Record
set R0 = `echo "50"` # Search radius
 
set AVG = `cat S2 | awk -v X=${X0} -v Y=${Y0} -v R=${R0} '{print $1,$2,$3,sqrt(($1-X)*($1-X) + ($2-Y)*($2-Y)) <=R}'  | awk '{if($4==1){print $0}}' | awk '{ sum+=$3} END {print sum/NR}'`# Search all the points within the input file which lies within R #Average all the $3,values#
 
echo "${X0} ${Y0} ${XS} ${AVG}" >> TMP #For each record print a new line with existing $1,$2,$3 and $AVG
 
@ num++ # end of loop and it goes to next NR
end
 

This could be very easy for you experts.

Thanks,

Without understanding what exactly you are doing to that file, I can see that you are creating 15 processes to run 15 commands on every single line, which will have a serious effect if done for many lines / a large file. On the other hand, for 10 lines only that should not be noticed at all.

I'd say all of the above could be done in one single ( awk ?) script, accelerating the entire processing considerably.

Aside: the XS value will always be 0 as $4 does not exist in the input file (assuming "S2" IS the input file).

1 Like

Thanks a lot RudiC.

The catch was fantastic. You are right considering the input file that should be

$3

and not

$4

.

Let me explain what I am trying to do......

For NR==1,
 
I am trying to look through or search through the entire input file and find X ($1, NR >1 to NR = last record) and Y ($2, NR>1 to NR=last record) which lies within a radius of 50meter from the X0 ($1 for NR=1) and Y0($1 for NR=1). If 'ANY' found , print "1" and then add $3 for all those points and divide by the numbers of the point found inside the circle of radius R=50.
 
For NR==2 ------
Repeating the same process above.
 
For NR== 3------etc etc till last line of the record.

So, basically searching the points which lies within a radius of 50 from each points within that file and then averaging out the

$3

with number of the points found inside that circle.

Thanks,

So - is S2 the input file? How many lines? Will the calculations be done for every line in the file (lets call that ALLINES) or just for the first num lines? Yielding ALLINES * ALLINES result lines as opposed to num * ALLINES result lines? Will all the results go to one single output file?

1 Like

Yes Rudi.

S2

is the input file.
The file has almost 80000 lines.
Calculations will be done for every line (ALLLINES), yielding ALLLINES * ALLLINES result.

Thanks,

I'm not sure I interpreted your requirements correctly. Your sample file doesn't have any line's coordinates within radius 50 from any other, so any test is impossible for that set (it did some avaraging for R = 1500).
Try

awk -vR0=50 '
        {X[NR]=$1
         Y[NR]=$2
         V[NR]=$3
        }
END     {for (n=1; n<=NR; n++)  {X0 = X[n]
                                 Y0 = Y[n]
                                 SUM = CNT = 0
                                 for (i=1; i<=NR; i++)  {R = sqrt((X-X0)*(X-X0) + (Y-Y0)*(Y-Y0))
                                                         if (R<=R0)     {SUM += V
                                                                         CNT++
                                                                        }
                                                        }
                                 print X0, Y0, V[n], SUM/CNT
                                }
        }
' file
1970113.00000 3460.00000 1.09516 1.09516
1970116.00000 3791.00000 1.06350 1.0635
1970120.00000 4120.00000 1.07588 1.07588
1970115.00000 4450.00000 1.09591 1.09591
1970116.00000 4780.00000 1.09965 1.09965
1970120.00000 5109.00000 1.06733 1.06733
1970122.00000 5440.00000 1.03760 1.0376
1970124.00000 5770.00000 1.02025 1.02025
1970123.00000 6100.00000 1.00998 1.00998
1970120.00000 6430.00000 0.96426 0.96426

Not sure how this approach would handle 80000 lines, though...

---------- Post updated at 20:05 ---------- Previous update was at 20:00 ----------

I have to correct myself - that won't be ALLINES * ALLINES result lines but ALLINES result lines, still ALLINES * ALLINES computations...

2 Likes

Many many thanks RudiC. Besides solution, it is always great learning from the script you create. I highly appreciate your skills.

Best Regards

I'd be interested in some feedback on how it works with meaningful data - i.e. several data point within the R0 radius.