XY interpolation by time in awk

Hi I am a newbie in awk scripting.

I'm working with a file with xy coordinates that were acquired with a time stamp. All the time stamps were recorded but not the XY coordinates. Let see an example:

FFID          	X	    Y    	  UNIX TIME     TIMEGAP
10001	469557,9	4266045,60	1386557563,76	
10002	469558,32	4266046,88	1386557564,65	
10003	469558,65	4266048,13	1386557565,55	
10004		                	1386557566,45	
10005	                		1386557567,36	
10006	                		1386557568,25	
10007	469559,82	4266051,82	1386557569,16	3,61
10008	469560,48	4266053,00	1386557570,05	
10009	469561,10	4266054,12	1386557570,95	
10010	469561,64	4266055,28	1386557571,85	
10011	469562,00	4266056,50	1386557572,76	
10012	469562,25	4266057,80	1386557573,66	
10013	469562,52	4266059,11	1386557574,55	
10014		                	1386557575,45	
10015		                	1386557576,35	
10016	                		1386557577,25	
10017		                 	1386557578,16	
10018	469564,31	4266063,98	1386557579,06	4,51
.
.
.
.
.

What I need to implement in awk is a interpolation by time e.g:

For the gap 10004-10006 fill the X gaps with this formula

x(i) = x1 + t(i)-t(1)/timegap e.g

10004 = x(10003) + [[t(10004)-t(10003)]/3.61]
10005 = x(10003) + [[t(10005)-t(10003)/3.61]
10006 = x(10003) + [[t(10006)-t(10003)]/3.61]

where t is UNIX TIME and so one for the next gaps.

Can anyone please help me.

Thanks in advance.

Please reconsider the formulae you presented. If used as given, x(10004) = 1002939159,51 . If parenthesizing the difference, as I assume it should be, x(10007) = 469559,65 , so the interpolation doesn't fit. It's even worse for the second gap.

I RudiC,

Thanks for your reply.

In fact I forgot parenthesizing. Is now correct. I need only to fill the gaps with the interpolation by time. The low value after the last gap in this example x (10007) = 469559,65 don't need to be interpolated because was measured. What I need is to interpolate only the gaps (that are variable)

For example for the formula that I want the result for the gaps is:

10004	469558,8993
10005	469559,1514
10006	469559,3979

My need is to automatize this process.

Printing the values you gave and the deltas, you can see that this is not a correct interpolation as the last delta is too large

469558.65
                0.2493 
469558.8993
                0.2521
469559.1514
                0.2465
469559.3979
                0.4221
469559.82

This data was acquired by a moving vessel. The timetag
was always recorded but the not the coordinates. So as the vessel speed is not constant I will have always will have deltas in the interpolation.

Your interpolation formula is adding a percentage of the number one to the coordinates; not a percentage of the range between two coordinate values. Instead of the formula:

x(i) = x1 + (t(i)-t(1))/timegap

you need something more like:

x(i) = x(1) + (x(N) - x(1)) * ((t(i) - t(1)) / (t(N) - t(1)))

where N is the index of the next known coordinate point and timestamp.

Using this formula, the awk script:

awk '
NR == 1 {
	print
	next
}
NF == 2{t[++ic] = $2
	id[ic] = $1
	next
}
{	if(ic > 1) interpolate()
	print
	t[ic = 1] = $4
	x = $2
	y = $3
}
function interpolate(	X, Y, i, j) {
#	id[++ic] = $1;t[ic] = $4	# Test Only
	for(i = 2; i <= ic; i++) {
		X = x + ($2 - x) * ((t - t[1]) / ($4 - t[1]))
		Y = y + ($3 - y) * ((t - t[1]) / ($4 - t[1]))
		printf("%s\t%.4f\t%.4f\t%s\n", id, X, Y, t)
	}
}' file

(with your sample input with commas converted to periods (since I'm in the US instead of Portugal)) produces the output:

FFID          	X	    Y    	  UNIX TIME     TIMEGAP
10001	469557.9	4266045.60	1386557563.76	
10002	469558.32	4266046.88	1386557564.65	
10003	469558.65	4266048.13	1386557565.55	
10004	469558.9417	4266049.0499	1386557566.45
10005	469559.2366	4266049.9801	1386557567.36
10006	469559.5251	4266050.8898	1386557568.25
10007	469559.82	4266051.82	1386557569.16	3.61
10008	469560.48	4266053.00	1386557570.05	
10009	469561.10	4266054.12	1386557570.95	
10010	469561.64	4266055.28	1386557571.85	
10011	469562.00	4266056.50	1386557572.76	
10012	469562.25	4266057.80	1386557573.66	
10013	469562.52	4266059.11	1386557574.55	
10014	469562.8772	4266060.0818	1386557575.45
10015	469563.2344	4266061.0537	1386557576.35
10016	469563.5916	4266062.0255	1386557577.25
10017	469563.9528	4266063.0082	1386557578.16
10018	469564.31	4266063.98	1386557579.06	4.51

As a sanity check, you can uncomment line before the for loop. In that mode it produces the output:

FFID          	X	    Y    	  UNIX TIME     TIMEGAP
10001	469557.9	4266045.60	1386557563.76	
10002	469558.32	4266046.88	1386557564.65	
10003	469558.65	4266048.13	1386557565.55	
10004	469558.9417	4266049.0499	1386557566.45
10005	469559.2366	4266049.9801	1386557567.36
10006	469559.5251	4266050.8898	1386557568.25
10007	469559.8200	4266051.8200	1386557569.16
10007	469559.82	4266051.82	1386557569.16	3.61
10008	469560.48	4266053.00	1386557570.05	
10009	469561.10	4266054.12	1386557570.95	
10010	469561.64	4266055.28	1386557571.85	
10011	469562.00	4266056.50	1386557572.76	
10012	469562.25	4266057.80	1386557573.66	
10013	469562.52	4266059.11	1386557574.55	
10014	469562.8772	4266060.0818	1386557575.45
10015	469563.2344	4266061.0537	1386557576.35
10016	469563.5916	4266062.0255	1386557577.25
10017	469563.9528	4266063.0082	1386557578.16
10018	469564.3100	4266063.9800	1386557579.06
10018	469564.31	4266063.98	1386557579.06	4.51

The lines shown in red confirm that the interpolated values for the end of the range match the actual end of range values.

The provided values are all shown with the number of decimal places provided in the input; the interpolated values are shown with four decimal places. If you'd like the interpolated values to be printed with two decimal places, change both occurrences of %.4f in the printf format string to %.2f . (I used a different number of decimal places to make it obvious which values were interpolated.)

Note that if you have other code that is generating the TIMEGAP field in your sample input, it isn't needed for this code to work.

You haven't said what OS you're using. If you want to run this script on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk or nawk .

1 Like

Dear Don,

First I want to thank you for you suggestion and correction of the equation that was obviously wrong as RudiC also said.

Secondly I also thank you for your clean, simple and functional code!

Thirdly I ask you if you don't my mind a little more help.

I have a second pack of data that I need to interpolate based on the previous interpolated file.

I try to be short and clear:

Based on the RESULT.txt file (with all the data interpolated) I need to interpolate the values from a second file AIS_557.txt that have (due to issues during data acquisition) very few data points and with time stamps without decimals, based on first file. See example:

RESULT.TXT
010001 0469557.90 4266045.60 1386557563.76
010002 0469558.32 4266046.88 1386557564.65
010003 0469558.65 4266048.13 1386557565.55
010004 0469558.94 4266049.05 1386557566.45
010005 0469559.24 4266049.98 1386557567.36
010006 0469559.53 4266050.89 1386557568.25
010007 0469559.82 4266051.82 1386557569.16
010008 0469560.48 4266053.00 1386557570.05
010009 0469561.10 4266054.12 1386557570.95
010010 0469561.64 4266055.28 1386557571.85
010011 0469562.00 4266056.50 1386557572.76
010012 0469562.25 4266057.80 1386557573.66
010013 0469562.52 4266059.11 1386557574.55
010014 0469562.88 4266060.08 1386557575.45
010015 0469563.23 4266061.05 1386557576.35
010016 0469563.59 4266062.03 1386557577.25
010017 0469563.95 4266063.01 1386557578.16
010018 0469564.31 4266063.98 1386557579.06
010019 0469564.85 4266065.22 1386557579.95
010020 0469565.34 4266066.47 1386557580.85
010021 0469565.79 4266067.67 1386557581.76
...........
 AIS_557.txt
468953.73 4264882.13 1386533790
468944.97 4264871.07 1386533820
468936.22 4264860.01 1386533850
468927.46 4264848.94 1386533880
468918.70 4264837.88 1386533910
468918.65 4264826.79 1386533940
468909.89 4264815.72 1386533970
468892.38 4264793.60 1386534030
468883.62 4264782.54 1386534060
468874.90 4264782.57 1386534090
468866.14 4264771.51 1386534150
468857.43 4264771.54 1386534180
468839.95 4264760.51 1386534210
468822.43 4264738.39 1386534240
468813.67 4264727.33 1386534270
..........

So I need to find the best fit for each time stamp of AIS file on the RESULT file. Interpolate X Y (for each time stamp of RESULT file) for the second file (AIS file).

Me and a colleague we are trying to adapt your function to do this but it is not working yet. The code is this one:

awk '
function interpola(i, P1, P2,  Xi, Yi) {
	Xi= Xais[P1] +((Xais[P2]-Xais[P1])*(Tsis-Tais[P1]))/(Tais[P2]-Tais[P1])
	Yi= Yais[P1] +((Yais[P2]-Yais[P1])*(Tsis-Tais[P1]))/(Tais[P2]-Tais[P1])
	printf("%06i %010.2f %010.2f %013.2f\n", FFID, Xi, Yi, T)
#	print FFID, T, P1, P2 #test fitting points for interpolation (keeps line number where are P1 or P2)
}
{
NR==FNR{Tsis[FNR]=$4; Sis[FNR]=$0; next} # Reads the first file with principal serie 
{Xais[FNR]=$1; Yais[FNR]=$2; Tais[FNR]=$3 } # Reads the second file with AIS values to interpolate

}
END {
for (i in Tsis[FNR]) {
	for (j=1; j<FNR; j++) {	# Runs down until find P1
		if (Tsis>Tais[FNR]) {
			P1=j
			break
		}
	}
	for (k=j; k<=FNR; k++) { # Runs down until find P2
		if ((Tsis<Tais[FNR]) ) {
			P2=k
			break
		}
	}	
	interpola(i, P1, P2) 	#interpolation and print interpolated values
}

}' '  RESULT.txt AIS_557.txt  > AIS_INTERPOLATION.txt

Can you give us, please, a suggestion!

Thanks in advance

I'm glad my code helped you get what you wanted on your last problem.

I don't understand the new problem you specified in post #7 in this thread. Interpolation involves finding a fit between two known data points. Every entry you have shown us in your file AIS_557.txt comes before the first entry in the file RESULT.TXT . And, while x and y values are increasing with time in RESULT.TXT , they are decreasing in AIS_557.txt ; so even with the ... at the end of both files, it appears that there is no overlap between the data in these files.

Please show us the output you are hoping to produce from the sample input files so we can understand what you're trying to do. Do you just want to add .00 to all of the timestamps in AIS_557.txt , write that modified data into a new file and then copy the contents of RESULT.TXT to the end of the new file?

Please show us desired o/p for post #7 , I am bit confused about requirement.

Some more posts on forum might help you..

http://www.unix.com/unix-for-dummies-questions-and-answers/247167-interpolation-if-there-no-exact-match-value-2.html