其他
OceanBase 4.0 解读:分布式查询性能提升,我们是如何思考的 ?
目前OceanBase SQL引擎的技术负责人。2016年加入 OceanBase,负责SQL引擎的研发工作。2008年毕业于哈尔滨工业大学,2014年在新加坡国立大学获得博士学位,博士期间主要研究方向是数据库领域的(多)查询优化和处理。在加入OceanBase之前,曾经在华为从事数据库的研发工作。
create table R1(a int primary key, b int, c int, d int) partition by hash(a) partitions 4;
create table R2(a int primary key, b int, c int, d int) partition by hash(a) partitions 4;
create table R3(a int primary key, b int, c int, d int) partition by hash(b) partitions 5;
select R2.c, sum(R3.d) from R1, R2, R3 where R1.a = R2.a and R2.C = R3.C group by R2.C;
create table R1(a int primary key, b int, c int);
create table R2(a int primary key, b int, c int, index idx(b)) partition by hash(a) partitions 100;
Q2: select * from R1, R2 where R2.b = R1.b and R1.c = 100;
/*一阶段计划*/
| =============================================
|ID|OPERATOR |NAME |EST. ROWS|COST |
---------------------------------------------
|0 |NESTED-LOOP JOIN| |970299 |85622|
|1 | TABLE SCAN |r1 |990 |40790|
|2 | TABLE SCAN |r2(idx)|1 |44 |
=============================================
Outputs & filters:
-------------------------------------
0 - output([r1.a], [r1.b], [r1.c], [r2.a], [r2.b], [r2.c]), filter(nil),
conds(nil), nl_params_([r1.b])
1 - output([r1.b], [r1.c], [r1.a]), filter([r1.c = 100]),
access([r1.b], [r1.c], [r1.a]), partitions(p0)
2 - output([r2.b], [r2.a], [r2.c]), filter(nil),
access([r2.b], [r2.a], [r2.c]), partitions(p0)
create table R1(a int primary key, b int, c int, d int) partition by hash(a) partitions 4;create table R2(a int primary key, b int, c int, d int) partition by hash(a) partitions 4;create table R3(a int primary key, b int, c int, d int) partition by hash(b) partitions 5;Q3: select R2.c, sum(R3.d) from R1, R2, R3 where R1.a = R2.a and R2.b = R3.b;
枚举所有单机的连接算法,维护序这个物理属性,使用单机代价模型来计算代价。 保留代价最小的计划和存在有用序的计划,一个计划的序是有用的当且仅当该序对后续算子的分配有用。
create table R1(a int primary key, b int, c int, d int) partition by hash(a) partitions 4;
create table R2(a int primary key, b int, c int, d int) partition by hash(a) partitions 4;
select R1.a, SUM(R2.c) from R1, R2 where R1.b = R2.b group by R1.a;
create table R1(a int primary key, b int, c int) partition by hash(a) partitions 4;
explain select b, sum(c) from R1 group by b;
| ==========================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
----------------------------------------------------------
|0 |PX COORDINATOR | |1 |10 |
|1 | EXCHANGE OUT DISTR |:EX10001|1 |10 |
|2 | HASH GROUP BY | |1 |9 |
|3 | EXCHANGE IN DISTR | |1 |9 |
|4 | EXCHANGE OUT DISTR (HASH)|:EX10000|1 |8 |
|5 | HASH GROUP BY | |1 |8 |
|6 | PX PARTITION ITERATOR | |1 |7 |
|7 | TABLE SCAN |r1 |1 |7 |
==========================================================
Outputs & filters:
-------------------------------------
0 - output([INTERNAL_FUNCTION(r1.b, T_FUN_SUM(T_FUN_SUM(r1.c)))]), filter(nil), rowset=256
1 - output([INTERNAL_FUNCTION(r1.b, T_FUN_SUM(T_FUN_SUM(r1.c)))]), filter(nil), rowset=256, dop=1
2 - output([r1.b], [T_FUN_SUM(T_FUN_SUM(r1.c))]), filter(nil), rowset=256,
group([r1.b]), agg_func([T_FUN_SUM(T_FUN_SUM(r1.c))])
3 - output([r1.b], [T_FUN_SUM(r1.c)]), filter(nil), rowset=256
4 - (#keys=1, [r1.b]), output([r1.b], [T_FUN_SUM(r1.c)]), filter(nil), rowset=256, dop=1
5 - output([r1.b], [T_FUN_SUM(r1.c)]), filter(nil), rowset=256,
group([r1.b]), agg_func([T_FUN_SUM(r1.c)])
6 - output([r1.b], [r1.c]), filter(nil), rowset=256
7 - output([r1.b], [r1.c]), filter(nil), rowset=256,
access([r1.b], [r1.c]), partitions(p[0-3])
下压算子 hash 表尽量维持在 L2 cache (1M) 内, 如果预聚合效果不好,标记该 hash表状态为舍弃。如果预聚合效果很好, 可以将 hash 表扩张到 L3 cache(10 M),如果执行过程中发现需要更大的内存,标记该 hash 表为舍弃状态。 如果当前 hash 表的状态是舍弃状态, 返回 hash 表内所有行并释放,重新建 hash 表,开启下一轮的采样检查。 如果连续 5 次采样检查预聚合效果都不好,就跳过当前下压的 Group by 算子。
下压场景 | 举例 | 3.x版本 | 4.0版本 |
Group by, 不存在有distinct去重的聚合函数 | select a, sum(d) from t group by a; | 支持 | 支持 |
Group By, 存在有distinct去重的聚合函数 | select a, sum(distinct c),count(distinct d) from t group by a; | 不支持 | 支持 |
Rollup | select a, sum(d) from t group by a rollup(b); | 不支持 | 支持 |
Distinct | select distinct a from t; | 支持 | 支持 |
Window | Function select a, b, sum(d) over (partition by c) from t; | 不支持 | 支持 |
create table R1(a int, b int, c int, d int, primary key(a,b)) partition by hash(b) partitions 4;
Q1: select sum(distinct c), sum(distinct d) from R1 where a = 5;
| =====================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
-----------------------------------------------------
|0 |SCALAR GROUP BY | |1 |2365|
|1 | PX COORDINATOR | |3960 |2122|
|2 | EXCHANGE OUT DISTR |:EX10000|3960 |1532|
|3 | PX PARTITION ITERATOR| |3960 |1532|
|4 | TABLE SCAN |r1 |3960 |1532|
=====================================================
Outputs & filters:
-------------------------------------
0 - output([T_FUN_SUM(distinct r1.c)], [T_FUN_SUM(distinct r1.d)]), filter(nil),
group(nil), agg_func([T_FUN_SUM(distinct r1.c)], [T_FUN_SUM(distinct r1.d)])
1 - output([r1.c], [r1.d]), filter(nil)
2 - output([r1.c], [r1.d]), filter(nil), dop=1
3 - output([r1.c], [r1.d]), filter(nil)
4 - output([r1.c], [r1.d]), filter(nil),
access([r1.c], [r1.d]), partitions(p[0-3])
create table R1(a int, b int, c int, d int, primary key(a,b)) partition by hash(b) partitions 4;
select sum(distinct c) from R1 where a = 5;
| ===========================================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
-----------------------------------------------------------
|0 |SCALAR GROUP BY | |1 |1986|
|1 | PX COORDINATOR | |1 |1835|
|2 | EXCHANGE OUT DISTR |:EX10001|1 |1835|
|3 | MERGE GROUP BY | |1 |1835|
|4 | EXCHANGE IN DISTR | |1 |1683|
|5 | EXCHANGE OUT DISTR (HASH)|:EX10000|1 |1683|
|6 | HASH GROUP BY | |1 |1683|
|7 | PX PARTITION ITERATOR | |3960 |1532|
|8 | TABLE SCAN |r1 |3960 |1532|
===========================================================
Outputs & filters:
-------------------------------------
0 - output([T_FUN_SUM(T_FUN_SUM(distinct r1.c))]), filter(nil),
group(nil), agg_func([T_FUN_SUM(T_FUN_SUM(distinct r1.c))])
1 - output([T_FUN_SUM(distinct r1.c)]), filter(nil)
2 - output([T_FUN_SUM(distinct r1.c)]), filter(nil), dop=1
3 - output([T_FUN_SUM(distinct r1.c)]), filter(nil),
group(nil), agg_func([T_FUN_SUM(distinct r1.c)])
4 - output([r1.c]), filter(nil)
5 - (#keys=1, [r1.c]), output([r1.c]), filter(nil), dop=1
6 - output([r1.c]), filter(nil),
group([r1.c]), agg_func(nil)
7 - output([r1.c]), filter(nil)
8 - output([r1.c]), filter(nil),
access([r1.c]), partitions(p[0-3]
create table R1(a int, b int, c int, d int, primary key(a,b)) partition by hash(b) partitions 4;select sum(distinct c), sum(distinct d) from R1 where a = 5;| ===========================================================|ID|OPERATOR |NAME |EST. ROWS|COST|-----------------------------------------------------------|0 |SCALAR GROUP BY | |1 |13 ||1 | PX COORDINATOR | |2 |13 ||2 | EXCHANGE OUT DISTR |:EX10001|2 |12 ||3 | HASH GROUP BY | |2 |11 ||4 | EXCHANGE IN DISTR | |2 |10 ||5 | EXCHANGE OUT DISTR (HASH)|:EX10000|2 |9 ||6 | HASH GROUP BY | |2 |8 ||7 | PX PARTITION ITERATOR | |1 |7 ||8 | TABLE SCAN |r1 |1 |7 |===========================================================Outputs & filters:------------------------------------- 0 - output([T_FUN_SUM(T_FUN_SUM(dup(r1.c)))], [T_FUN_SUM(T_FUN_SUM(dup(r1.d)))]), filter(nil), rowset=256, group(nil), agg_func([T_FUN_SUM(T_FUN_SUM(dup(r1.c)))], [T_FUN_SUM(T_FUN_SUM(dup(r1.d)))]) 1 - output([AGGR_CODE], [T_FUN_SUM(dup(r1.c))], [T_FUN_SUM(dup(r1.d))]), filter(nil), rowset=256 2 - output([AGGR_CODE], [T_FUN_SUM(dup(r1.c))], [T_FUN_SUM(dup(r1.d))]), filter(nil), rowset=256, dop=1 3 - output([AGGR_CODE], [T_FUN_SUM(dup(r1.c))], [T_FUN_SUM(dup(r1.d))]), filter(nil), rowset=256, group([AGGR_CODE]), agg_func([T_FUN_SUM(dup(r1.c))], [T_FUN_SUM(dup(r1.d))]) 4 - output([AGGR_CODE], [dup(r1.c)], [dup(r1.d)]), filter(nil), rowset=256 5 - (#keys=3, [AGGR_CODE], [dup(r1.c)], [dup(r1.d)]), output([AGGR_CODE], [dup(r1.c)], [dup(r1.d)]), filter(nil), rowset=256, dop=1 6 - output([AGGR_CODE], [dup(r1.c)], [dup(r1.d)]), filter(nil), rowset=256, group([AGGR_CODE], [dup(r1.c)], [dup(r1.d)]), agg_func(nil) 7 - output([r1.c], [r1.d]), filter(nil), rowset=256 8 - output([r1.c], [r1.d]), filter(nil), rowset=256, access([r1.c], [r1.d]), partitions(p[0-3])
TPC-DS 100GB 性能测试对比(OceanBase 3.x vs. 4.0)
历史推荐