Hello,
We're using mysql-5.1.40.
I have a question about the subquery transform in MySQL's optimizer.
Given the following query:
select 1 from sub1 where (c1,c2) in (select min(c1),min(c2) from sub2
group by c3 having count(c1) > 4);
It generates the following extended plan output:
mysql> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message
|
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select 1 AS `1` from `qrefd01m`.`sub1` where
<in_optimizer>((`qrefd01m`.`sub1`.`c1`,`qrefd01m`.`sub1`.`c2`),<exists>(select
min(`qrefd01m`.`sub2`.`c1`) AS `min(c1)`,min(`qrefd01m`.`sub2`.`c2`)
AS `min(c2)` from `qrefd01m`.`sub2` group by `qrefd01m`.`sub2`.`c3`
having ((count(`qrefd01m`.`sub2`.`c1`) > 4) and
((<cache>(`qrefd01m`.`sub1`.`c1`) = min(`qrefd01m`.`sub2`.`c1`)) or
isnull(min(`qrefd01m`.`sub2`.`c1`))) and
((<cache>(`qrefd01m`.`sub1`.`c2`) = min(`qrefd01m`.`sub2`.`c2`)) or
isnull(min(`qrefd01m`.`sub2`.`c2`))) and
<is_not_null_test>(min(`qrefd01m`.`sub2`.`c1`)) and
<is_not_null_test>(min(`qrefd01m`.`sub2`.`c2`))))) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
I found this transformation very confusing. MySQL pushs the join
condition down to the subquery, which is smart. But I don't understand
why a "or isnull" has to be attached to the join. This is in the WHERE
clause, and it's a IN variant (not NOTIN), can't NULL result be safely
ignored? Even if it can't, the optimizer again adds a
<is_not_null_test> checker to the having clause, which exactly
contradict the "or isnull" filter it adds earlier.
So each of the item on the SELECT clause of the IN subquery is transformed to
(cache=item or item is null) having item is not null,
which to my understanding is equivalent to (cache = item).
I must have missed something or/and misunderstand something about this
transformation. Please advice.
Thanks,
Grace
--
Zhixuan Zhu
http://yunpingzhou.spaces.live.com/