Find gaps in time data and replace missing time value and column 2 value by interpolation in awk

Dear all,

I am kindly seeking assistance on the following issue.

I am working with data that is sampled every 0.05 hours (that is 3 minutes intervals) here is a sample data from the file

5.00000   15.5030
5.05000   15.6680
5.10000   16.0100
5.15000   16.3450
5.20000   16.7120
5.25000   17.1020
5.30000   17.4900
5.35000   17.8870
5.40000   18.2850
5.45000   18.6970
5.50000   19.1170
5.55000   19.5400
5.60000   19.9730
5.65000   20.3950
5.70000   20.8100
5.75000   21.2700
5.80000   21.7180
5.85000   22.1780
5.90000   22.6270
5.95000   23.0570
6.25000   24.0850
6.30000   24.5270
6.35000   24.9830
6.40000   25.4480
6.45000   25.9000
6.50000   26.3520
6.55000   27.3630
6.60000   27.7830
6.65000   28.2700
6.70000   28.8780
6.75000   29.3120
6.80000   29.7130
6.85000   30.0780
6.90000   30.4630
6.95000   30.9540
7.00000   31.4240

As you can see my data has a big gap here

5.95000   23.0570
6.25000   24.0850

I am trying to find a way to locate such a gap and replace the missing time values on column 1 and create corresponding values of column 2 for each missing time value by interpolation. Something like

5.95000   23.0570
6.00000
6.05000
6.10000                <- column 2 must be filled with interpolated values
6.15000
6.20000
6.25000   24.0850

I am processing my data with AWK, and so an awk solution would be easier to integrate into my code.

PLEASE NOTE: that I am creating the data by resampling some other data that looks like this

 5.00000    46.51        3   15.503
 5.00833    46.71        3   15.570
 5.01667    46.91        3   15.637
 5.02500    47.12        3   15.707
 5.03333    47.32        3   15.773
 5.04167    62.44        4   15.610
 5.05000    62.67        4   15.668
 5.05833    62.90        4   15.725
 5.06667    63.15        4   15.787
 5.07500    63.38        4   15.845
 5.08333    63.58        4   15.895
 5.09167    63.80        4   15.950
 5.10000    64.04        4   16.010
 5.10833    64.27        4   16.067
 5.11667    64.47        4   16.117
 5.12500    64.69        4   16.172
 5.13333    64.89        4   16.223
 5.14167    65.14        4   16.285
 5.15000    65.38        4   16.345
 5.15833    65.60        4   16.400
 5.16667    65.85        4   16.463
 5.17500    66.08        4   16.520
 5.18333    66.35        4   16.587
 5.19167    66.61        4   16.652
 5.20000    66.85        4   16.712
 5.20833    67.11        4   16.777
 5.21667    67.39        4   16.848
 5.22500    67.63        4   16.907
 5.23333    67.88        4   16.970
 5.24167    68.14        4   17.035
 5.25000    68.41        4   17.102
 5.25833    68.68        4   17.170
 5.26667    68.90        4   17.225
 5.27500    69.18        4   17.295
 5.28333    69.43        4   17.358
 5.29167    69.69        4   17.422
 5.30000    69.96        4   17.490
 5.30833    70.23        4   17.558
 5.31667    70.50        4   17.625
 5.32500    70.75        4   17.688
 5.33333    71.00        4   17.750
 5.34167    71.28        4   17.820
 5.35000    71.55        4   17.887
 5.35833    71.80        4   17.950
 5.36667    72.07        4   18.018
 5.37500    72.34        4   18.085
 5.38333    72.60        4   18.150
 5.39167    72.89        4   18.223
 5.40000    73.14        4   18.285
 5.40833    73.42        4   18.355
 5.41667    73.70        4   18.425
 5.42500    73.98        4   18.495
 5.43333    74.25        4   18.562
 5.44167    74.52        4   18.630
 5.45000    74.79        4   18.697
 5.45833    75.07        4   18.768
 5.46667    75.36        4   18.840
 5.47500    75.63        4   18.907
 5.48333    75.91        4   18.977
 5.49167    76.20        4   19.050
 5.50000    76.47        4   19.117
 5.50833    76.74        4   19.185
 5.51667    77.04        4   19.260
 5.52500    77.30        4   19.325
 5.53333    77.60        4   19.400
 5.54167    77.88        4   19.470
 5.55000    78.16        4   19.540
 5.55833    78.47        4   19.617
 5.56667    78.77        4   19.692
 5.57500    79.06        4   19.765
 5.58333    79.36        4   19.840
 5.59167    79.62        4   19.905
 5.60000    79.89        4   19.973
 5.60833    80.19        4   20.047
 5.61667    80.48        4   20.120
 5.62500    80.73        4   20.183
 5.63333    81.05        4   20.262
 5.64167    81.29        4   20.323
 5.65000    81.58        4   20.395
 5.65833    81.81        4   20.453
 5.66667    82.12        4   20.530
 5.67500    82.39        4   20.597
 5.68333    82.68        4   20.670
 5.69167    82.94        4   20.735
 5.70000    83.24        4   20.810
 5.70833    83.57        4   20.892
 5.71667    83.87        4   20.968
 5.72500    84.20        4   21.050
 5.73333    84.50        4   21.125
 5.74167    84.79        4   21.198
 5.75000    85.08        4   21.270
 5.75833    85.38        4   21.345
 5.76667    85.65        4   21.413
 5.77500    85.95        4   21.488
 5.78333    86.25        4   21.562
 5.79167    86.55        4   21.638
 5.80000    86.87        4   21.718
 5.80833    87.20        4   21.800
 5.81667    87.48        4   21.870
 5.82500    87.78        4   21.945
 5.83333    88.08        4   22.020
 5.84167    88.39        4   22.098
 5.85000    88.71        4   22.178
 5.85833    88.99        4   22.247
 5.86667    89.29        4   22.323
 5.87500    89.59        4   22.398
 5.88333    89.88        4   22.470
 5.89167    90.18        4   22.545
 5.90000    90.51        4   22.627
 5.90833    90.77        4   22.693
 5.91667    91.06        4   22.765
 5.92500    91.35        4   22.837
 5.93333    91.63        4   22.908
 5.94167    91.91        4   22.977
 5.95000    92.23        4   23.057
 5.95833    92.54        4   23.135
 5.96667    92.88        4   23.220
 5.97500    93.19        4   23.297
 5.98333    93.50        4   23.375
 5.99167    93.80        4   23.450
 6.25000    96.34        4   24.085
 6.25833    96.62        4   24.155
 6.26667    96.93        4   24.233
 6.27500    97.20        4   24.300
 6.28333    97.52        4   24.380
 6.29167    97.80        4   24.450
 6.30000    98.11        4   24.527
 6.30833    98.39        4   24.598
 6.31667    98.69        4   24.672
 6.32500    98.99        4   24.747
 6.33333    99.31        4   24.828
 6.34167    99.61        4   24.902
 6.35000    99.93        4   24.983
 6.35833   100.23        4   25.058
 6.36667   100.54        4   25.135
 6.37500   100.85        4   25.213
 6.38333   101.17        4   25.292
 6.39167   101.49        4   25.373
 6.40000   101.79        4   25.448
 6.40833   102.10        4   25.525
 6.41667   102.40        4   25.600
 6.42500   102.71        4   25.677
 6.43333   102.99        4   25.748
 6.44167   103.29        4   25.823
 6.45000   103.60        4   25.900
 6.45833   103.91        4   25.977
 6.46667   104.22        4   26.055
 6.47500   104.50        4   26.125
 6.48333   104.81        4   26.203
 6.49167   105.10        4   26.275
 6.50000   105.41        4   26.352
 6.50833   105.71        4   26.428
 6.51667   105.99        4   26.497
 6.52500   106.29        4   26.572
 6.53333   106.59        4   26.648
 6.54167   106.87        4   26.718
 6.55000    82.09        3   27.363
 6.55833    82.28        3   27.427
 6.56667    82.49        3   27.497
 6.57500    82.70        3   27.567
 6.58333    82.91        3   27.637
 6.59167    83.13        3   27.710
 6.60000    83.35        3   27.783
 6.60833    83.60        3   27.867
 6.61667    83.85        3   27.950
 6.62500    84.06        3   28.020
 6.63333    84.32        3   28.107
 6.64167    84.56        3   28.187
 6.65000    84.81        3   28.270
 6.65833    85.05        3   28.350
 6.66667    85.27        3   28.423
 6.67500   114.58        4   28.645
 6.68333   114.92        4   28.730
 6.69167   115.22        4   28.805
 6.70000   115.51        4   28.878
 6.70833   115.82        4   28.955
 6.71667   116.12        4   29.030
 6.72500   116.40        4   29.100
 6.73333   116.69        4   29.172
 6.74167   116.99        4   29.248
 6.75000   117.25        4   29.312
 6.75833   117.54        4   29.385
 6.76667   117.80        4   29.450
 6.77500   118.07        4   29.518
 6.78333   118.34        4   29.585
 6.79167   118.60        4   29.650
 6.80000   118.85        4   29.713
 6.80833   119.13        4   29.782
 6.81667   119.36        4   29.840
 6.82500   119.60        4   29.900
 6.83333   119.83        4   29.957
 6.84167   120.05        4   30.012
 6.85000   120.31        4   30.078
 6.85833   120.52        4   30.130
 6.86667   120.77        4   30.192
 6.87500   121.03        4   30.258
 6.88333   121.30        4   30.325
 6.89167   121.54        4   30.385
 6.90000   121.85        4   30.463
 6.90833   122.17        4   30.543
 6.91667   122.47        4   30.617
 6.92500   122.75        4   30.688
 6.93333   153.97        5   30.794
 6.94167   154.39        5   30.878
 6.95000   154.77        5   30.954
 6.95833   155.18        5   31.036
 6.96667   155.58        5   31.116
 6.97500   155.97        5   31.194
 6.98333   156.34        5   31.268
 6.99167   156.74        5   31.348
 7.00000   157.12        5   31.424
 7.00833   157.47        5   31.494

using the following code

awk 'BEGIN{n = 500}
  {if((($1 * 10000) % n) == 0) {printf "%7.5f   %-6.4f\n", $1, $4; p1 = 0}
  else if((($1 + p1) * 10000) % 500 == 0)
    {printf "%7.5f   %-6.4f\n", ($1 + p1)/2, ($4 + p4)/2; p1 = 0}
  else {p1 = $1; p4 = $4}}' inFile > outFile

this code handles small gaps but not big ones like in this situation.

I am putting this here so maybe some one can just help me modify the code to handle this gap problem

This seems very similar to several earlier requests in these forums. A good one to look at for ideas would be XY interpolation by time in awk. If that doesn't give you what you need, the last post in that thread provides pointers to several similar threads.

Please look at those first, and if you can't find anything in them that helps, explain how this case is different from the problem solved in the above thread and we'll try to help you get a working solution.

Dear Don,

Thank you for your reply. I have read and seen several "almost similar issues" including the "XY interpolation by time in awk" you have referred to. I have spent several days trying to see how I can adapt the solution there to my problem and am still trying but being relatively new to awk programming, I have not managed to get anywhere. My problem here is relatively different from that in that, that data is quite different from mine here. In addition, my time here is some floating point values, I have seen some posts in "Expand & Interpolation" which I thought could help by since for them they needed integer values, the issue is simpler as it can be handled by integer in the for loop. I have spent several days going through several examples which I thought could be similar to my situation but each of them seems different

I will really appreciate any help given

Please be aware that your data are not a linear function of time; on top of some noise it has a small curvature, and the six data points right after the gap are somewhat lower than they should be (there's a jump in values but not in time delta at point 7).
This is a quick and dirty approximation to exactly your problem and data; no error checking etc. is done. It's sort of a linear interpolation between the given boundaries although we know the boundary to the right is questionable. On top, my mawk has a problem with the D1 > D0 comparison, sometimes the delta is -1E-16, sometimes it's +71E-16, so a few extra lines are being "interpolated". I don't have a good solution at hand; either increase the to be compared value slightly (yuck!) or use sort -u on the result (yuck!)...
However, try

awk     'NR==1  {L1=$1; L2=$2; D0=0.05}
                {D1=$1-L1; D2=$2-L2
                 if (D1 > D0)   { n=D1/D0                   # D1 sometimes is sliiightly larger than D0
                                 ST=D2/n
                                 for (i=1; i<=int(n); i++) printf "%7.5f   %6.4f <---\n", L1+D0*i, L2+ST*i}
                 L1=$1; L2=$2
                 print $0}
        ' file
1 Like

Dear RudiC and everyone,

Thank you. this solution works and does exactly what I really needed. The interpolation is just alright. As you can see this is time series data for some quantity that depends on the sun, which exhibits higher values during the day that at night. this is morning and the variation of the quantity more complex due to too many factors. So this interpolation does just fine. I have just increased the precision of the D0/D1 to include more significant figures and that solves the problem you indicated with the

D1 > D0

.

Again may I say thank you and am sure this will help others as well.