SQL: copying data down

figaro · November 29, 2011, 2:21pm

I have a series of observations of which one column is sometimes missing (zero):

date	temp	delta
1977	284.54	29.84
1978	149.82	0
1979	320.71	28.45
1980	176.76	0
1981	854.65	0
1984	817.65	0
1985	990.58	27.98
1986	410.21	0
1987	405.93	0
1988	482.9	0

What I would like to achieve is a SQL selection whereby the non-zero values are copied down, in other words, to achieve the following:

date	temp	delta
1977	284.54	29.84
1978	149.82	29.84
1979	320.71	28.45
1980	176.76	28.45
1981	854.65	28.45
1984	817.65	28.45
1985	990.58	27.98
1986	410.21	27.98
1987	405.93	27.98
1988	482.9	27.98

Is there a standard way to achieve this without having to resort to creating a procedural SQL script?

DROP TABLE IF EXISTS `temperatures`;
CREATE TABLE `temperatures` (
  `date` integer,
  `temp` double,
  `delta` double
) ENGINE=MyISAM;
INSERT INTO `temperatures` VALUES (1977,284.54,29.84), (1978,149.82,''), 
(1979,320.71,28.45), (1980,176.76,''), (1981,854.65,''), 
(1984,817.65,''), (1985,990.58,27.98), (1986,410.21,''), 
(1987,405.93,''), (1988,482.9,''), (1991,269.02,'');

Corona688 · November 29, 2011, 2:39pm

Don't know a way to get the last row inside an SQL statement, no. It'd be extremely easy in anything but SQL. Can you feed it through awk?

awk 'BEGIN { OFS=FS="\t" } NR>1 { if($3) { L=$3; } else { $3=L } } 1' < data

date    temp    delta
1977    284.54  29.84
1978    149.82  29.84
1979    320.71  28.45
1980    176.76  28.45
1981    854.65  28.45
1984    817.65  28.45
1985    990.58  27.98
1986    410.21  27.98
1987    405.93  27.98
1988    482.9   27.98


$

figaro · November 29, 2011, 3:59pm

Thank you for your answer. Actually we have a solution in C++, which is highly performant. The only problem is, we need to do this every time this data is part of a calculation, and therefore redundant (unless someone sits down and actually creates an UPDATE statement too).

radoulov · November 29, 2011, 5:40pm

mysql> select
    ->   date,
    ->   temp,
    ->   case
    ->     when delta > 0 && @d := delta then delta
    ->     else @d
    ->   end as delta
    -> from
    ->   temperatures
    -> order by
    -> date;
+------+--------+-------+
| date | temp   | delta |
+------+--------+-------+
| 1977 | 284.54 | 29.84 |
| 1978 | 149.82 | 29.84 |
| 1979 | 320.71 | 28.45 |
| 1980 | 176.76 | 28.45 |
| 1981 | 854.65 | 28.45 |
| 1984 | 817.65 | 28.45 |
| 1985 | 990.58 | 27.98 |
| 1986 | 410.21 | 27.98 |
| 1987 | 405.93 | 27.98 |
| 1988 |  482.9 | 27.98 |
| 1991 | 269.02 | 27.98 |
+------+--------+-------+
11 rows in set (0.00 sec)

figaro · November 30, 2011, 5:02pm

Ran a few tests on this and the result is immediate for a few hundred rows and 2.4 seconds on a table with 200.000 rows, so suitably fast for our purposes.

radoulov · November 30, 2011, 5:09pm

If you need to improve the performance of the above statement you may try to index the date column
(in certain cases, for example Oracle (not sure for MySQL, you may check wit explain <your_statement_here> ),
may try to use the index in order to avoid the costly sorting operation -
because the optimizer is aware that the data in the index is already sorted).

BTW: thanks for providing the necessary code in order to set up a test case!

figaro · November 30, 2011, 5:26pm

Thank you again for your contribution on this. Just ran the statement again on a slightly faster machine and achieved 0.41 seconds without indexes and 0.98 seconds with indexes. So the indexing strategy clearly needs some tweaking, but the overall approach is very promising.

radoulov · November 30, 2011, 5:35pm

This could be due to caching (just run the index test again). If you have time you could try to check with explain if the index is actually used. Here is the explanation.

figaro · December 8, 2011, 1:35pm

Thank you for that. The data is of course a heavily simplified example and our indexes are on other columns as well. Either way, we have a solid solution for now.