I know I know..
for sure one of the easier mysql statements. But somehow I can not figure out this.
I expect to see all distinct items of 'data_12' where 'kwroot' has 'straxx' in, and in the same row 'data_12' ist (not = 'kwsearched' in any existing row)
data_12 kwroot kwsearched
straxx password straxx straxx
xxxx yyy straxx password
SELECT data_12, kwroot, kwsearched
FROM sc
WHERE kwroot LIKE 'straxx'
AND (data_12 NOT like kwsearched)
GROUP BY data_12
Result:
data_12 kwroot kwsearched
straxx password straxx straxx
I see this statement works correct, but how can I tell him to check that "AND (data_12 NOT like kwsearched)"<== but also not in whole 'kwsearched'??
I expect the result to be empty
If I understand correctly:
SELECT data_12, kwroot, kwsearched
FROM sc
WHERE kwroot = 'straxx'
AND data_12 NOT IN (
SELECT DISTINCT kwsearched
FROM sc
);
or:
SELECT a.data_12, a.kwroot, a.kwsearched
FROM sc a
LEFT JOIN sc b
ON a.data_12 = b.kwsearched
WHERE a.kwroot = 'straxx'
AND b.kwsearched is NULL;
Yes, works great!
Many thanx!
EDIT:
Yes works great, but now, when I use this on about 2000 rows of data I get this: "Database has gone away..."
or the statement takes more than one minute to execute for both examples.
Is there something I can do to optimize?
Could you provide the table structure (desc tablename;)?
Which version of MySQL server you are using?
hi radoulov,
this is the structure:
I added a short part to your statement to make work better for me into:
SELECT data_12 FROM sc
WHERE kwroot LIKE '$term' and kwsearched like '$term'
and data_12 not in (select kwsearched from sc)
GROUP BY data_12
ORDER BY relevance_13 DESC
LIMIT 1;
mysql 5.1.36
+------------------------+-------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+-------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| kwroot | varchar(50) | YES | | NULL | |
| kwsearched | varchar(50) | YES | | NULL | |
| period | varchar(15) | YES | | NULL | |
| datamonth | varchar(10) | YES | | NULL | |
| type | varchar(4) | YES | | NULL | |
| m1_0 | varchar(4) | YES | | NULL | |
| m2_1 | varchar(4) | YES | | NULL | |
| m3_2 | varchar(4) | YES | | NULL | |
| m4_3 | varchar(4) | YES | | NULL | |
| m5_4 | varchar(4) | YES | | NULL | |
| m6_5 | varchar(4) | YES | | NULL | |
| m7_6 | varchar(4) | YES | | NULL | |
| m8_7 | varchar(4) | YES | | NULL | |
| m9_8 | varchar(4) | YES | | NULL | |
| m10_9 | varchar(4) | YES | | NULL | |
| m11_10 | varchar(4) | YES | | NULL | |
| m12_11 | varchar(4) | YES | | NULL | |
| data_12 | varchar(50) | YES | | NULL | |
| relevance_13 | varchar(20) | YES | | NULL | |
| monthly2_14 | varchar(13) | YES | | NULL | |
| monthly_avg2_15 | varchar(15) | YES | | NULL | |
| searched_this_month_16 | varchar(15) | YES | | NULL | |
| monthly_avg_17 | varchar(15) | YES | | NULL | |
| competitor_18 | varchar(10) | YES | | NULL | |
| curr_19 | varchar(5) | YES | | NULL | |
| euro_20 | varchar(4) | YES | | NULL | |
| cent_21 | varchar(2) | YES | | NULL | |
| micricent_22 | varchar(10) | YES | | NULL | |
| avg_pos_23 | varchar(5) | YES | | NULL | |
| f1_24 | varchar(4) | YES | | NULL | |
| f2_25 | varchar(4) | YES | | NULL | |
| f3_26 | varchar(4) | YES | | NULL | |
| maxmonth_27 | varchar(4) | YES | | NULL | |
| minus_kw_28 | varchar(50) | YES | | NULL | |
| how_exact_29 | varchar(5) | YES | | NULL | |
| created | timestamp | NO | | CURRENT_TIMESTAMP | |
| cur_timestamp | timestamp | NO | | 0000-00-00 00:00:00 | |
+------------------------+-------------+------+-----+---------------------+----------------+
38 rows in set (0.01 sec)
In order to speed up the query you should consider adding an index (or indexes).
What index you should add, depends on your data: you should start by indexing the most selective column(s) used in the where clause.
I would start with something like this:
create index sc_kwr_dt_i on sc(kwroot, data_12);
create index sc_ksc on sc (kwsearched);
You should consider indexing only subset of the data, if that subset is selective.
You should also consider the space needed for your indexes and their impact on the DML operations.
If the query is slow after indexing (some of) the columns in the where clause, post the output of the following statement:
explain select data_12 FROM sc ...
I would verify the performance of both versions (in vs outer join) after creating the indexe(s).
Could you post also the output of the following command:
show create table sc;
radoulov, you are the greatest...!
I did exactly what you said. And now, on over 200.000 rows so far, this query takes about 0.01 sek now...
Compared to about a minute and more before, this is a plus, I would say.