其他
技术分享 | Semi-join Materialization 子查询优化策略
作者:胡呈清
爱可生 DBA 团队成员,擅长故障分析、性能优化,个人博客:https://www.jianshu.com/u/a95ec11f67a8,欢迎讨论。
什么是 Semi-join
select country.* from country join city on country.code=city.country_code \
and population>20000000;
+---------+----------+
| code | name |
+---------+----------+
| 1 | china |
| 1 | china |
+---------+----------+
2 rows in set (0.00 sec)
select * from country where code in \
(select country_code from city where population>20000000);
+------+---------+
| code | name |
+------+---------+
| 1 | china |
+------+---------+
1 row in set (0.00 sec)
Semi-join 限制
子查询必须是出现在顶层的 WHERE、ON 子句后面的 IN 或者 =ANY 子查询必须是单个 select,不能是 union; 子查询不能有 group by 或者 having 子句(可以用 semijoin materialization 策略,其他不可以 ); It must not be implicitly grouped (it must contain no aggregate functions). (不知道啥意思,保持原文); 子查询不能有 order by with limit; 父查询中不能有 STRAIGHT_JOIN 指定联接顺序; The number of outer and inner tables together must be less than the maximum number of tables permitted in a join.
Semi-join 实现策略
Duplicate Weedout FirstMatch LooseScan Materialize
semijon=ON,控制 semijoin 是否开启的开关 firstmatch、loosescan、duplicateweedout、materialization 分别是四种策略的开关,默认都是开启的
extra 中出现 Start temporary、End temporary,表示使用了 Duplicate Weedout 策略 extra 中出现 FirstMatch(tbl_name) ,表示使用了 FirstMatch 策略 extra 中出现 LooseScan(m..n),表示使用了 LooseScan 策略 select_type 列为 MATERIALIZED,以及 table 列为 <subqueryN>,表示使用了 Materialize 策略
Semi-join Materialization
Materialization-scan Materialization-lookup
select * from Country
where Country.code IN (select City.Country
from City
where City.Population > 7*1000*1000)
and Country.continent='Europe'
从物化表到 Country 表 从 Country 表到物化表
第二种方法在物化表中查找数据时可以使用主键进行查找,因此叫做:Materialization-lookup。
Materialization-scan
select * from Country where Country.code IN (select City.Country \
from City where City.Population > 7*1000*1000);
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
| 1 | PRIMARY | <subquery2> | ALL | NULL | NULL | NULL | NULL | 15 | |
| 1 | PRIMARY | Country | eq_ref | PRIMARY | PRIMARY | 3 | world.City.Country | 1 | |
| 2 | MATERIALIZED | City | range | Population,Country | Population | 4 | NULL | 15 | Using index condition |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
有两个查询(id=1 和 id=2); 第 2 个查询(id=2)的 select_type=MATERIALIZED,意思是子查询结果保存到一个临时表中,在读取的字段上建立主键,主键的目的是去除重复行; 第 1 行的 table=<subquery2>,代表使用的表正是第 2 个查询的物化临时表。
先执行子查询,走的 Population 索引,扫描了 15 行,得到 15 行结果; 将上一步得到的结果保存到临时表中; 从临时表中取出一行数据,到 Country 表中去查找满足联接条件的行,走 Country 表的主键,每次扫描 1 行; 重复 3,直到遍历临时表结束。
Materialization-lookup
select * from Country where Country.code IN (select City.Country \
from City where City.Population > 1*1000*1000) ;
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
| 1 | PRIMARY | Country | ALL | PRIMARY | NULL | NULL | NULL | 239 | |
| 1 | PRIMARY | <subquery2> | eq_ref | auto_key | auto_key | 3 | func | 1 | |
| 2 | MATERIALIZED | City | range | Population,Country | Population | 4 | NULL | 238 | Using index condition |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
先执行子查询,走的 Population 索引,扫描了 238 行,得到 238 行结果; 将上一步得到的结果保存到临时表中; 从 Country 表中取出一行数据,到物化临时表中去查找满足联接条件的行,走物化表的主键,每次扫描 1 行; 重复 3,直到遍历 Country 表结束(一共 239 行)。
注意事项
select dept_name from departments where dept_no in \
(select min(dept_no) from dept_emp where emp_no<10020 group by dept_no);
+----+-------------+-------------+-------+-----------------+-----------+---------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | rows | Extra |
+----+-------------+-------------+-------+-----------------+-----------+---------+------------------------------------------------------------------+
| 1 | PRIMARY | departments | index | NULL | dept_name | 42 | 9 | Using where; Using index |
| 2 | SUBQUERY | dept_emp | range | PRIMARY,dept_no | PRIMARY | 4 | 21 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------+-------+-----------------+-----------+---------+------+-----------------------------------------------------------+
参考资料
https://mariadb.com/kb/en/semi-join-materialization-strategy/ https://dev.mysql.com/doc/refman/5.7/en/semijoins.html
社区近期动态