SQL: copying data down

I have a series of observations of which one column is sometimes missing (zero):

date	temp	delta
1977	284.54	29.84
1978	149.82	0
1979	320.71	28.45
1980	176.76	0
1981	854.65	0
1984	817.65	0
1985	990.58	27.98
1986	410.21	0
1987	405.93	0
1988	482.9	0

What I would like to achieve is a SQL selection whereby the non-zero values are copied down, in other words, to achieve the following:

date	temp	delta
1977	284.54	29.84
1978	149.82	29.84
1979	320.71	28.45
1980	176.76	28.45
1981	854.65	28.45
1984	817.65	28.45
1985	990.58	27.98
1986	410.21	27.98
1987	405.93	27.98
1988	482.9	27.98

Is there a standard way to achieve this without having to resort to creating a procedural SQL script?

DROP TABLE IF EXISTS `temperatures`;
CREATE TABLE `temperatures` (
  `date` integer,
  `temp` double,
  `delta` double
) ENGINE=MyISAM;
INSERT INTO `temperatures` VALUES (1977,284.54,29.84), (1978,149.82,''), 
(1979,320.71,28.45), (1980,176.76,''), (1981,854.65,''), 
(1984,817.65,''), (1985,990.58,27.98), (1986,410.21,''), 
(1987,405.93,''), (1988,482.9,''), (1991,269.02,'');

Don't know a way to get the last row inside an SQL statement, no. It'd be extremely easy in anything but SQL. Can you feed it through awk?

awk 'BEGIN { OFS=FS="\t" } NR>1 { if($3) { L=$3; } else { $3=L } } 1' < data

date    temp    delta
1977    284.54  29.84
1978    149.82  29.84
1979    320.71  28.45
1980    176.76  28.45
1981    854.65  28.45
1984    817.65  28.45
1985    990.58  27.98
1986    410.21  27.98
1987    405.93  27.98
1988    482.9   27.98


$

Thank you for your answer. Actually we have a solution in C++, which is highly performant. The only problem is, we need to do this every time this data is part of a calculation, and therefore redundant (unless someone sits down and actually creates an UPDATE statement too).

mysql> select
    ->   date,
    ->   temp,
    ->   case
    ->     when delta > 0 && @d := delta then delta
    ->     else @d
    ->   end as delta
    -> from
    ->   temperatures
    -> order by
    -> date;
+------+--------+-------+
| date | temp   | delta |
+------+--------+-------+
| 1977 | 284.54 | 29.84 |
| 1978 | 149.82 | 29.84 |
| 1979 | 320.71 | 28.45 |
| 1980 | 176.76 | 28.45 |
| 1981 | 854.65 | 28.45 |
| 1984 | 817.65 | 28.45 |
| 1985 | 990.58 | 27.98 |
| 1986 | 410.21 | 27.98 |
| 1987 | 405.93 | 27.98 |
| 1988 |  482.9 | 27.98 |
| 1991 | 269.02 | 27.98 |
+------+--------+-------+
11 rows in set (0.00 sec)
3 Likes

Ran a few tests on this and the result is immediate for a few hundred rows and 2.4 seconds on a table with 200.000 rows, so suitably fast for our purposes.

If you need to improve the performance of the above statement you may try to index the date column
(in certain cases, for example Oracle (not sure for MySQL, you may check wit explain <your_statement_here> ),
may try to use the index in order to avoid the costly sorting operation -
because the optimizer is aware that the data in the index is already sorted).

BTW: thanks for providing the necessary code in order to set up a test case!

Thank you again for your contribution on this. Just ran the statement again on a slightly faster machine and achieved 0.41 seconds without indexes and 0.98 seconds with indexes. So the indexing strategy clearly needs some tweaking, but the overall approach is very promising.

This could be due to caching (just run the index test again). If you have time you could try to check with explain if the index is actually used. Here is the explanation.

Thank you for that. The data is of course a heavily simplified example and our indexes are on other columns as well. Either way, we have a solid solution for now.