thanks radoulov for your interest.
As you can imagine, my real problem is far more complex than a tables called b with just 2 field.
In the real world, I have a big table containing a huge number of uploaded files. All files were archived (tgz) and then uploaded. So I have I directory loaded with tons of tgz files. Some of those tgz actually contain the same file but uploaded at two different times. I will keep all files to be able to recover versions. But I'm first interested in the list of files (not tgz files but the files they contain) in their latest status (mod, owner, group). I hope this is clear enough.
This is my real table:
mysql> DESCRIBE uploaded;
+------------+------------------+------+-----+---------+----------------------------------------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------------------------------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| tgz_path | varchar(512) | NO | | NULL | path to a tgz file |
| file_mod | varchar(9) | YES | | NULL | mod of the file inside the tgz |
| file_owner | varchar(32) | YES | | NULL | owner of the file inside the tgz |
| file_group | varchar(32) | YES | | NULL | group of the file inside the tgz |
| file_time | datetime | YES | | NULL | modification time of the file inside the tgz |
| file_name | varchar(512) | NO | | NULL | path of the file inside the tgz |
+------------+------------------+------+-----+---------+----------------------------------------------+
And this is a sample of what it contains:
+----+---------------------+-----------+------------+------------+---------------------+--------------------------+
| id | tgz_path | file_mod | file_owner | file_group | file_time | file_name |
+----+---------------------+-----------+------------+------------+---------------------+--------------------------+
| 1 | /home/ftp/18027.tgz | rw-r----- | root | root | 2010-06-28 10:11:51 | /home/santiago/doc.txt |
| 2 | /home/ftp/8400.tgz | rw-r--r-- | santiago | santiago | 2010-06-29 17:37:29 | /home/santiago/doc.txt |
| 3 | /home/ftp/18695.tgz | rw-rw-r-- | root | root | 2010-06-27 11:42:56 | /usr/local/bin/script.sh |
| 4 | /home/ftp/27750.tgz | rwxr-x--- | foo | foo | 2010-06-29 00:53:32 | /usr/local/bin/script.sh |
| 5 | /home/ftp/4333.tgz | rwxrwxr-x | foo | foo | 2010-06-28 09:17:14 | /usr/local/bin/script.sh |
+----+---------------------+-----------+------------+------------+---------------------+--------------------------+
As you can see, files have changed mod and owner over time. I highlighted in red the latest record per file.
Then I have two other tables: files and versions
mysql> DESCRIBE files;
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| file_mod | varchar(9) | NO | | NULL | |
| file_owner | varchar(32) | NO | | NULL | |
| file_group | varchar(32) | NO | | NULL | |
| file_time | datetime | NO | | NULL | |
| file_name | varchar(512) | NO | | NULL | |
+------------+------------------+------+-----+---------+----------------+
mysql> DESCRIBE versions;
+-----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| file_id | int(10) unsigned | NO | | NULL | |
| file_time | datetime | NO | | NULL | |
| path | varchar(512) | NO | | NULL | |
+-----------+------------------+------+-----+---------+----------------+
I want to save in table files the latest record per file. I first INSERT new record:
mysql> INSERT IGNORE INTO files (file_mod, file_owner, file_group, file_time, file_name)
SELECT file_mod, file_owner, file_group, file_time, file_name
FROM uploaded
GROUP BY file_name;
and then UPDATE records with the latest statistics. That's where my query comes:
mysql> UPDATE files JOIN (
SELECT uploaded.file_mod, uploaded.file_owner, uploaded.file_group, uploaded.file_time, uploaded.file_name
FROM uploaded
JOIN (SELECT MAX(file_time) AS file_time, file_name
FROM uploaded
GROUP BY file_name) AS max_uploaded ON uploaded.file_time = max_uploaded.file_time AND uploaded.file_name = max_uploaded.file_name
GROUP BY uploaded.file_time, uploaded.file_name) AS uploaded ON files.file_name = uploaded.file_name
SET files.is_folder = uploaded.is_folder,
files.file_mod = uploaded.file_mod,
files.file_owner = uploaded.file_owner,
files.file_group = uploaded.file_group,
files.file_time = uploaded.file_time
WHERE uploaded.file_time > files.file_time;
That's it! Ask me if it's too messy.
Now you're right, to put some well choosen will probably blindingly increase the speed of the query. What do you think?