Interpolation using awk

Hi all,
Consider I have a text file containing:

1003   60
1005   80
1100   110

Based on that file I need to create another file which is containing value from 1001 till 1100 which is a linear interpolation between two point (for 1004; 1006;1007 until 1109) and extrapolation based on 2 point (for 1001 and 1002; based on equation from 1003 and 1005 value)

I wonder if it could be done by using AWK script. Or if there is another solution it would be wonderful.

Thank you

Hi, Try this:

awk '
{
  P[$1]=$2
  I[i++]=$1
} 
END{
  j=0; s=I[j]; t=I[j+1]
  for(i=m;i<=n;i++){
    if(I[j+2] && i>t){
      j++; s=I[j]; t=I[j+1]
    }                        
    print i,P+(i-s)*(P[t]-P)/(t-s)
  }                                     
}
' m=1001 n=1100 infile
1 Like

Hi.

Comment:

That seems like a small sample to interpolate from 1003 to 1100. I'm guessing that there are a few typos here, 1100 that should be 1010 or 1011 ... cheers, drl

This code is working properly, thanks a lot :smiley:
Now i just need to figure out the meaning of the code
It must be correlate with using array in awk, need to figure it out.

Hi,

Some pointers:

array P contains the array of points, x is the index
array I contains the sequence of x coordinates

These arrays get filled while processing the file
After that is done then the standard interpolation and extrapolation (there is no real difference) is done in the print statement for all enumerated points and virtual points..
If the enumerated point is beyond a threshold then the points on the basis of which the inter/extrapolation is calculated is switched to the next set..

You could perhaps visualize this by drawing a line with the points on a piece of paper and indicate the known points and then figure out how you would calculate the inter/extrapolated value and what set of points you would use for that...

1 Like

Hi Scrutinizer, could you explain about this part

    if(I[j+2] && i>t)

i still dont get it about how I[j+2] works in the code

j is used for the (known) x coordinates, i is used for all (inter/extrpolated) coordinates, the left x-coordinate is called s and the right x-coordinate is t.

so this checks to see if a shift to use the next pair of coordinates is necessary and this is the case if i>t ( which means i>I[j+1] ) unless the coordinates we then would need to shift to (from I[j] and I[j+1] to I[j+1] and I[j+2]), do not exist. This will not be the case for I[j+1], but if I[j+2] does not exist, then we have reached the end of the array I.

If this is the case (then n lies to the right of the range of coordinates) then in fact we are switching automatically from interpolation to extrapolation. The same formula is used for extrapolation as for interpolation.

1 Like

Now I get it,
Thanks for all your help :b: