查看原文
其他

深度剖析PostgreSQL中的执行计划

xiongcc PostgreSQL学徒 2023-10-20

前言

前一篇我们分析了PostgreSQL中统计信息的作用以及采集算法,这一篇我们分析一下执行计划是如何生成以及如何阅读评估执行计划。

相关概念

在此之前,先了解下几个概念,方便我们理解。

索引

首先是耳熟能详的索引,索引本质上就是一个数据结构,用于高效地检索数据,比如BTREE索引,检索一条记录的复杂度是O(LogN)。但这并不意味着索引越多越好,首先索引需要维护会影响写入性能,其次一些功能性的索引也会限制数据的写入(比如CIC留下的无效唯一索引),另外索引越多,优化器要考虑的也会更多,可能会走到糟糕的执行计划


参照此图 👆🏻,高效的索引扫描 (绿色部分),只需扫描几次 (traverse) 就能获取到所需数据,而糟糕的索引扫描往往需要返回大量数据,如图中红色部分,需要多次扫描,每次都要经历树根 → 树干 → 树枝 → 树叶,导致大量的离散IO。

好的顺序扫描数据更加紧凑,都集中在一块,这样磁头扫过去,只需扫描有限个数据块,就可以获取到大部分所需数据。而糟糕的顺序扫描则数据相对离散,需要扫描大量数据并过滤才能获取到想要的数据,这个也就是我在统计信息里面提到的correlation字段的作用,表示列的物理顺序和逻辑顺序的相关性,相关性越高,走索引扫描的离散块扫描更少,走索引扫描的离散块扫描代价越低。

选择率

选择率 = 约束条件过滤后的元祖 / 约束条件过滤前的总元祖,选择率越高,意味着过滤的数据越多,这样的查询走索引的效率高,只需有限次的IO。


基本法则

前置条件介绍完,现在正式进入主题。首先是执行计划的阅读方式,我们需遵从一个基本法则:自底向上,从左往右

我们通常把各个阶段称作各种各样的树,比如plan tree,阅读执行计划也是一样的,从树的某个叶子节点,然后再到树枝、树干、树根,这么一种方法来了解一颗树,但是整体的plan tree特别复杂,满满一屏幕都显式不下的话,那么就可能出现 “只见叶子不见树干” 难以把握整体的情况,这时候可以换个思路,自顶向下,先通过一个上帝视角看个总览

以如下官网的例子为例,可以看到有 Sort / Hash Join / Seq Scan 等等,下方算子的输出作为上方算子的输入,层层递进。

EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2 ORDER BY t1.fivethous;

                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=717.34..717.59 rows=101 width=488) (actual time=7.761..7.774 rows=100 loops=1)
   Sort Key: t1.fivethous
   Sort Method: quicksort  Memory: 77kB
   ->  Hash Join  (cost=230.47..713.98 rows=101 width=488) (actual time=0.711..7.427 rows=100 loops=1)
         Hash Cond: (t2.unique2 = t1.unique2)
         ->  Seq Scan on tenk2 t2  (cost=0.00..445.00 rows=10000 width=244) (actual time=0.007..2.583 rows=10000 loops=1)
         ->  Hash  (cost=229.20..229.20 rows=101 width=244) (actual time=0.659..0.659 rows=100 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 28kB
               ->  Bitmap Heap Scan on tenk1 t1  (cost=5.07..229.20 rows=101 width=244) (actual time=0.080..0.526 rows=100 loops=1)
                     Recheck Cond: (unique1 < 100)
                     ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.04 rows=101 width=0) (actual time=0.049..0.049 rows=100 loops=1)
                           Index Cond: (unique1 < 100)
 Planning time: 0.194 ms
 Execution time: 8.008 ms

执行计划解析

启动成本

第一行的 cost = 717.34..717.59,分别是启动成本(start cost)和总成本(total cost),启动成本表示获取到第一行的成本,可以理解成获取第一行数据之前需要做一些事先准备,获取第一行所需的成本

The start-up cost is the cost expended before the first tuple is fetched. For example, the start-up cost of the index scan node is the cost to read index pages to access the first tuple in the target table.

比较典型的就是当走索引扫描时,需要先扫描索引页

postgres=# explain select id,info from test_big where id = 999;
                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Index Scan using test_big_pkey on test_big  (cost=0.43..8.45 rows=1 width=37)
   Index Cond: (id = 999)
(2 rows)

当然,还有很多其他节点也含有启动成本,不难理解,比如Merge Join需要先排序,Hash Join需要先构造哈希表等等

Query OperatorBehaviorStartup-cost?
ResultNon-table queryNo
Merge JoinMerge JoinYes
Hash JoinHash JoinYes
GroupProcessing of GROUP BY clauseYes
AggregateUse of aggregate processingYes
LimitProcessing of LIMIT clauseYes(OFFSET > 0)
Tid ScanTid ScanNo
Subquery ScanSubquery SearchNo
Function ScanFunction scanNo
SortSortingYes
Uniqueprocessing of DISTINCT / UNION clauseYes
SetOpprocessing of INTERCEPT / EXCEPT clauseYes

总成本

总成本 = 启动成本 + 运行成本,运行成本就是获取所有元组的成本,看一个简单例子

postgres=# create table test(id int);
CREATE TABLE
postgres=# insert into test values(generate_series(1,100000));
INSERT 0 100000
postgres=# analyze test;
ANALYZE
postgres=# explain select id from test where id < 500;
                       QUERY PLAN                        
---------------------------------------------------------
 Seq Scan on test  (cost=0.00..1693.00 rows=472 width=4)
   Filter: (id < 500)
(2 rows)
  1. rows:代表预估返回的行数,根据统计信息预估而来,后文会详细分解
  2. width:返回行的平均宽度 (字节),此处是int,所以是4字节
  3. 总成本是1693,启动成本是0,因为采用的是顺序扫描,无需什么准备动作,直接挨个扫描所有的line-pointer即可

The start-up cost is the cost expended before the first tuple is fetched. For example, the start-up cost of the index scan node is the cost to read index pages to access the first tuple in the target table.

The run cost is the cost to fetch all tuples.

The total cost is the sum of the costs of both start-up and run costs.

那么这个总成本是如何计算出来的呢?数据库中存在一些基准参数,根据这些值计算而来,注意这些值只是一个相对值:

  1. cpu_index_tuple_cost:cpu处理单个索引行的成本,默认0.005

  2. cpu_operator_cost:cpu处理单个操作符的成本,比如 id < 500,id = 99,操作符分别是小于和等于,默认0.0025

  3. cpu_tuple_cost:cpu处理一行的成本,默认0.01

  4. parallel_setup_cost:开启并行查询的代价,并行查询的各个进程之间需要相互通信,因此需要建立动态共享内存并执行一系列的初始化动作等,属于额外成本,默认是1000  (关于动态共享内存DSM,之前记录了一次生产案例,感兴趣的可以搜一下)

  5. parallel_tuple_cost:每一行从worker传递给leader的代价,即worker将一行数据放入动态共享内存,然后leader从中读取的代价,默认是0.1

    PostgreSQL: Query Parallelism in Action | Severalnines

  6. random_page_cost:随机扫描一个数据页的成本,默认是4

  7. seq_page_cost:顺序扫描一个数据页的成本,默认是1

默认情况下,数据库认为随机扫描的成本是顺序扫描的四倍,这个是针对传统HDD的,对于一些高端存储,顺序扫描和随机扫描的差异其实并不是太大,因此可以合理调低random_page_cost的值。并且,我们可以针对表空间设置不同的值。

postgres=# alter tablespace pg_default set (random_page_cost = 2);
ALTER TABLESPACE

此例是顺序扫描,我在统计信息篇里面说过,数据块和行数的信息存在于pg_class

postgres=# select relpages,reltuples from pg_class where relname = 'test';
 relpages | reltuples 
----------+-----------
      443 |    100000
(1 row)

数据块总数是443,预估返回487行,根据如上基准参数,那么总成本 = reltuples * cpu_tuple_cost + reltuples * cpu_operator_cost + relpages * seq_page_cost = 100000 * 0.01 + 100000 * 0.0025 + 443 * 1 = 4.87 + 1.2 + 443 = 1000 + 250 + 443 = 1693。

那么这个rows是怎么预估出来的呢?没错,基于pg_stats统计信息计算而来

postgres=# \d pg_stats
                     View "pg_catalog.pg_stats"
         Column         |   Type   | Collation | Nullable | Default 
------------------------+----------+-----------+----------+---------
 schemaname             | name     |           |          |     ---模式名
 tablename              | name     |           |          |     ---表名
 attname                | name     |           |          |     ---列名
 inherited              | boolean  |           |          |     ---是否是继承列
 null_frac              | real     |           |          |     ---null空值的比率
 avg_width              | integer  |           |          |     ---平均宽度,字节
 n_distinct             | real     |           |          |     ---大于零就是非重复值的数量,小于零则是非重复值的个数除以行数
 most_common_vals       | anyarray |           |          |     ---高频值
 most_common_freqs      | real[]   |           |          |     ---高频值的频率
 histogram_bounds       | anyarray |           |          |     ---直方图
 correlation            | real     |           |          |     ---物理顺序和逻辑顺序的关联性
 most_common_elems      | anyarray |           |          |     ---高频元素,比如数组
 most_common_elem_freqs | real[]   |           |          |     ---高频元素的频率
 elem_count_histogram   | real[]   |           |          |     ---直方图
 
postgres=# select tablename,attname,n_distinct,inherited,null_frac,avg_width,most_common_vals,most_common_freqs,histogram_bounds from pg_stats where tablename = 'test';
-[ RECORD 1 ]-----+--------------------------------------------------------------------------------------------------------------
tablename         | test
attname           | id
n_distinct        | -1
inherited         | f
null_frac         | 0
avg_width         | 4
most_common_vals  | 
most_common_freqs | 
histogram_bounds  | {6,1052,2075,3083,4085,5095,6126,7088,8101,9182,10199,11174,12095,13044,14014,14993,16032,17034,17997,19015,19983,21025,22016,22980,23904,24911,25982,27022,28024,29089,30111,31122,32116,33082,34112,35041,36012,37099,38090,39118,40052,41041,42046,43020,43991,44920,45932,46966,47952,48917,49938,50929,51857,52904,53872,54933,55906,56876,57882,58940,59951,60951,62015,63016,64128,65096,66091,66974,67983,68951,69920,70944,71949,72998,73900,74885,75844,76797,77885,78869,79833,80834,81808,82897,83917,84933,85913,86997,87999,89039,89943,90902,91885,92827,93849,94874,95911,96965,97998,98907,99998}

之前分析过每个列的意思,这里再重述一下

  1. n_distinct是-1,表示在某一列中的非重复值与总行数相同,用负数形式是因为analyze认为非重复值的数目是随着表增长而增长的,用正数表示字段看上去好像有固定数量的可能值

  2. null_frac为0,表示没有null值,注意null并不会实际插入到数据中,而是以TupleHeader中的位图来表示的,t_bits字段,一个字段一个比特位,同时结合infomask中的HEAP_HASNULL标志位进行标识

  3. most_common_vals:MCV,高频值列表,此例因为是顺序插入的,数据分布均匀,所以没有高频值

  4. most_common_freqs:MCF,高频值的频率

  5. histogram_bounds:直方图,表示该字段除了高频值以外值的的柱状图信息,直方图中的数据不包含MCV/MCF的部分,两者的值是补充关系而且不会重合,但不一定互补(两种加起来未必是全部数据),用于将对应列的值划分为多个分组。

    If the MCV cannot be used, the value of the histogram_bounds of the target column is used to estimate the cost. histogram_bounds is a list of values that divide the column's values into groups of approximately equal population.

为了演示,再新建一个包含数组和空值的表,观察most_common_elems和most_common_freqs

postgres=# create table test1(id int,id2 int[],id3 int);
CREATE TABLE
postgres=# insert into test1 values(null,array[1,2],1);
INSERT 0 1
postgres=# insert into test1 values(null,array[2,3],1);
INSERT 0 1
postgres=# insert into test1 select n,array[n,n+1],1 from generate_series(1,10000) as n;
INSERT 0 10000
postgres=# analyze test1;
ANALYZE
postgres=# select tablename,attname,n_distinct,inherited,null_frac,avg_width from pg_stats where tablename = 'test1';
 tablename | attname | n_distinct  | inherited |  null_frac  | avg_width 
-----------+---------+-------------+-----------+-------------+-----------
 test1     | id      | -0.98039216 | f         | 0.019607844 |         4
 test1     | id2     | -0.98039216 | f         |           0 |        29
 test1     | id3     |           1 | f         |           0 |         4
(3 rows)

postgres=# select most_common_elems,most_common_elem_freqs,elem_count_histogram from pg_stats where tablename = 'test1' and attname = 'id2';
-[ RECORD 1 ]----------+-----------------------------------------------------------------------------------------------------
most_common_elems      | {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101}
most_common_elem_freqs | {0.019607844,0.039215688,0.029411765,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.019607844,0.009803922,0.009803922,0.039215688,0}
elem_count_histogram   | {2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2}

这样就比较清晰了,most_common_elems、most_common_freqs 和 MCV / MCF 类似,只是针对于数组。

单列选择率

回到我们的例子,现在要查找小于500的数据。可以看到500位于第0个bucket里面(bucket[0].min是6,bucket[0].max是1052)

postgres=# explain select id from test where id < 500;
                       QUERY PLAN                        
---------------------------------------------------------
 Seq Scan on test  (cost=0.00..1693.00 rows=472 width=4)
   Filter: (id < 500)
(2 rows)

postgres=# select tablename,attname,n_distinct,inherited,null_frac,avg_width,most_common_vals,most_common_freqs,histogram_bounds from pg_stats where tablename = 'test';
-[ RECORD 1 ]-----+--------------------------------------------------------------------------------------------------------------
tablename         | test
attname           | id
n_distinct        | -1
inherited         | f
null_frac         | 0
avg_width         | 4
most_common_vals  | 
most_common_freqs | 
histogram_bounds  | {6,1052,2075,3083,4085,5095,6126,7088,8101,9182,10199,11174,12095,13044,14014,14993,16032,17034,17997,19015,19983,21025,22016,22980,23904,24911,25982,27022,28024,29089,30111,31122,32116,33082,34112,35041,36012,37099,38090,39118,40052,41041,42046,43020,43991,44920,45932,46966,47952,48917,49938,50929,51857,52904,53872,54933,55906,56876,57882,58940,59951,60951,62015,63016,64128,65096,66091,66974,67983,68951,69920,70944,71949,72998,73900,74885,75844,76797,77885,78869,79833,80834,81808,82897,83917,84933,85913,86997,87999,89039,89943,90902,91885,92827,93849,94874,95911,96965,97998,98907,99998}

因此选择率 selectivity的计算公式 是 (0 + ( 500 - bucket[0].min)/(bucket[0].max - bucket[0].min))/num_buckets ≈ 0.00472,0表示谓词位于第几个bucket,500是谓词。计算出选择率之后,那么预估的行数等于总行数乘上选择率,rows ≈ reltuples * selectivity ≈ 472。见如下例子👇🏻

假如需要查询的值刚好在高频值里面的话,那就十分简单,MCF里面记录了高频值的频率,98的频率是0.0121,那么rows selectivity * cardinality = 0.0121 * 10000 = 121

如果需要查询的值不在高频值里面,那么计算方式会稍微复杂一点,还要考虑空值的影响,如果有 NULL ,还要排除 NULL。selectivity = (1 - sum(mvf))/(num_distinct - num_mcv)

更多行估算的例子可以参照官网 Row Estimation Examples 和 Query Processing 章节,此处不过多赘述。

多列选择率

前面讨论的是针对单列的查询,我们再来看一下涉及到多个列的情况。默认情况下,PostgreSQL会认为多个列之间是相互独立的,因此对于含有多个列的查询,会将选择率相乘,还是上面的列子

可以看到预估的行数等于选择率相乘的结果,简单粗暴。但是优化器并没有这么stupid,他也有自己的一系列算法。

举个栗子,下面这几条SQL如何估算返回的行数?

  1. where a = xx or b =xx
  2. where a > xx or a < xx
  3. where a not in (xx)
  4. where a is not null
  5. where a > xxx and b < xx

遇事不决看代码,选择率相关的定义在src/backend/optimizer/path/clausesel.c中,先阅读一下注释

/*
 * clauselist_selectivity -
 *   Compute the selectivity of an implicitly-ANDed list of boolean
 *   expression clauses.  The list can be empty, in which case 1.0
 *   must be returned.  List elements may be either RestrictInfos
 *   or bare expression clauses --- the former is preferred since
 *   it allows caching of results.
 *
 * See clause_selectivity() for the meaning of the additional parameters.
 *
 
 // 基本方法是首先在尽可能多的子句上使用扩展统计信息,以便获取多列间的依赖等。然后通过取其
 // 选择性的乘积来估算剩余的子句,但只有当它们具有独立的概率时才正确,但实际上它们通常不是
 // 独立的,即使它们只引用了单个列。因此,我们希望尽可能地变得更加聪明。
 
 * The basic approach is to apply extended statistics first, on as many
 * clauses as possible, in order to capture cross-column dependencies etc.
 * The remaining clauses are then estimated by taking the product of their
 * selectivities, but that's only right if they have independent
 * probabilities, and in reality they are often NOT independent even if they
 * only refer to a single column.  So, we want to be smarter where we can.
 *
 
 // 我们还识别"范围查询",例如"x > 34 AND x < 42"。如果子句是其运算符具有 scalarltsel
 // 或相关函数作为其约束选择性估计器的约束 opclause,则子句被视为可能的范围查询成分。我们
 // 将引用相同变量的这种形式的子句配对。这种不可配对的子句以正常方式简单地乘以选择性乘积。但
 // 是当我们找到一对时,我们知道选择性代表了列范围内下限和上限的相对位置,因此我们可以将其计
 // 算为 hisel + osel - 1,而不是将其计算为 hisel * lostl。(为了可视化这一点,假设
 // hisel 是范围低于上限的比率,而 lossl 是高于下限的比率;因此 hisel 可以直接解释为0..1
 // 值,但我们需要将 lossl 转换为 1- lossl 在将其解释为值之前。那么可用范围是 1-losel +
 // hisel。但是,这个计算双重排除了空值,所以我们真的需要 hisel + lostl + null_frac- 1
 
 * We also recognize "range queries", such as "x > 34 AND x < 42".  Clauses
 * are recognized as possible range query components if they are restriction
 * opclauses whose operators have scalarltsel or a related function as their
 * restriction selectivity estimator.  We pair up clauses of this form that
 * refer to the same variable.  An unpairable clause of this kind is simply
 * multiplied into the selectivity product in the normal way.  But when we
 * find a pair, we know that the selectivities represent the relative
 * positions of the low and high bounds within the column's range, so instead
 * of figuring the selectivity as hisel * losel, we can figure it as hisel +
 * losel - 1.  (To visualize this, see that hisel is the fraction of the range
 * below the high bound, while losel is the fraction above the low bound; so
 * hisel can be interpreted directly as a 0..1 value but we need to convert
 * losel to 1-losel before interpreting it as a value.  Then the available
 * range is 1-losel to hisel.  However, this calculation double-excludes
 * nulls, so really we need hisel + losel + null_frac - 1.)
 *
 
 // 如果任一选择性正好是DEFAULT_INEQ_SEL,则忘记这个等式,而使用DEFAULT_RANGE_INEQ_SEL。 
 // 如果等式产生不可能的(负)结果,则同样适用。一个副作用是我们可以识别冗余不等式,例如
 // “x < 4 AND x < 5”; 只有更严格的约束才会被计算在内。当然,这完全取决于不等式选择性函数
 // 的行为;也许有一天我们可以推广这种方法。
 
 * If either selectivity is exactly DEFAULT_INEQ_SEL, we forget this equation
 * and instead use DEFAULT_RANGE_INEQ_SEL.  The same applies if the equation
 * yields an impossible (negative) result.
 *
 * A free side-effect is that we can recognize redundant inequalities such
 * as "x < 4 AND x < 5"; only the tighter constraint will be counted.
 *
 * Of course this is all very dependent on the behavior of the inequality
 * selectivity functions; perhaps some day we can generalize the approach.
 */

Selectivity
clauselist_selectivity(PlannerInfo *root,
        List *clauses,
        int varRelid,
        JoinType jointype,
        SpecialJoinInfo *sjinfo)
{
 return clauselist_selectivity_ext(root, clauses, varRelid,
           jointype, sjinfo, true);
}

函数体直接返回clauselist_selectivity_ext()

/*
 * clauselist_selectivity_ext -
 *   Extended version of clauselist_selectivity().  If "use_extended_stats"
 *   is false, all extended statistics will be ignored, and only per-column
 *   statistics will be used.
 */

Selectivity
clauselist_selectivity_ext(PlannerInfo *root,
         List *clauses,
         int varRelid,
         JoinType jointype,
         SpecialJoinInfo *sjinfo,
         bool use_extended_stats)
{
 Selectivity s1 = 1.0;
  
  [...]
  
   /*
  * Determine if these clauses reference a single relation.  If so, and if
  * it has extended statistics, try to apply those.  ---扩展统计信息
  */

 rel = find_single_rel_for_clauses(root, clauses);
 if (use_extended_stats && rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
 {
  /*
   * Estimate as many clauses as possible using extended statistics.
   *
   * 'estimatedclauses' is populated with the 0-based list position
   * index of clauses estimated here, and that should be ignored below.
   */

  s1 = statext_clauselist_selectivity(root, clauses, varRelid,
           jointype, sjinfo, rel,
           &estimatedclauses, false);
 }
  
  [...]
  
  /*
  * Apply normal selectivity estimates for remaining clauses. We'll be
  * careful to skip any clauses which were already estimated above.
  *
  * Anything that doesn't look like a potential rangequery clause gets
  * multiplied into s1 and forgotten. Anything that does gets inserted into
  * an rqlist entry.
  */

 listidx = -1;
 foreach(l, clauses)  ---进入循环
 {
    
     [...]
    
     if (ok)
   {
        
        // 如果不是< <= > >=操作符
    /*
     * If it's not a "<"/"<="/">"/">=" operator, just merge the
     * selectivity in generically.  But if it's the right oprrest,
     * add the clause to rqlist for later processing.
     */

    switch (get_oprrest(expr->opno))
    {
     case F_SCALARLTSEL:
     case F_SCALARLESEL:
      addRangeClause(&rqlist, clause,
            varonleft, true, s2);
      break;
     case F_SCALARGTSEL:
     case F_SCALARGESEL:
      addRangeClause(&rqlist, clause,
            varonleft, false, s2);
      break;
     default:
      /* Just merge the selectivity in generically */
      s1 = s1 * s2;
      break;
    }
    continue;  /* drop to loop bottom */
   }

switch里面会进行判断,对于F_SCALARLTSELF_SCALARLESELF_SCALARGTSELF_SCALARGESEL都会走到addRangeClause()范围条件语句里面,只是第四个参数不一样,而其他情况则将选择率相乘,这四个变量对应的是pg_operator里面的oid

#define F_SCALARLTSEL 103
#define F_SCALARLESEL 336
#define F_SCALARGTSEL 104
#define F_SCALARGESEL 337

postgres=# select oid,proname from pg_proc where oid in ('103','104','336','337');
 oid |   proname   
-----+-------------
 103 | scalarltsel   ---less than,小于
 104 | scalargtsel   ---greater than,大于
 336 | scalarlesel   ---less equal,小于等于
 337 | scalargesel   ---greate equal,大于等于
(4 rows)

addRangeClause()用于处理范围条件,接着会开始按照前面注释中说的算法进行处理,即 hibound + lobound - 1.0

/*
 * addRangeClause --- add a new range clause for clauselist_selectivity
 *
 * Here is where we try to match up pairs of range-query clauses
 */

static void
addRangeClause(RangeQueryClause **rqlist, Node *clause,
      bool varonleft, bool isLTsel, Selectivity s2)
{
  
  [...]
  
 /*
  * Now scan the rangequery pair list.   ---主函数
  */

 while (rqlist != NULL)
 {
  RangeQueryClause *rqnext;

  if (rqlist->have_lobound && rqlist->have_hibound)   ---如果既有上限又有下限
  {
   /* Successfully matched a pair of range clauses */
   Selectivity s2;

   /*
    * Exact equality to the default value probably means the
    * selectivity function punted.  This is not airtight but should
    * be good enough.
    */

   if (rqlist->hibound == DEFAULT_INEQ_SEL ||
    rqlist->lobound == DEFAULT_INEQ_SEL)
   {
    s2 = DEFAULT_RANGE_INEQ_SEL;
   }
   else
   {
    s2 = rqlist->hibound + rqlist->lobound - 1.0;   ---具体选择率的算法

    /* Adjust for double-exclusion of NULLs */
    s2 += nulltestsel(root, IS_NULL, rqlist->var,
          varRelid, jointype, sjinfo);

    /*
  
  [...]   
    
  else   ---如果不能匹配,即不是同一个谓词的时候,则将选择率相乘
  {
   /* Only found one of a pair, merge it in generically */

   if (rqlist->have_lobound)
    s1 *= rqlist->lobound;
   else
    s1 *= rqlist->hibound;
  }
  /* release storage and advance */
  rqnext = rqlist->next;
  pfree(rqlist);
  rqlist = rqnext;
 }

 return s1;

通过简单阅读注释我们能获取到一些关键信息

  1. 数据库会先尝试使用 extended statistics (扩展统计信息)对谓词进行估算clause_selectivity_ext,然后对剩余的谓词使用单列统计信息进行估算
  2. 假如查询是 where x > 10 and x < 20(针对同一个变量的谓词),那么选择率并不是进行简单的相乘,而是有一个边界值,假定x < 42的选择率为hisel,x < 34的选择率为losel,那么计算公式是 hisel - (1 - losel),即 hisel + losel -1,如果有空值,则范围查询的选择率为hisel + losel + null_frac - 1
  3. 如果是where x > 10 and y < 20,即不同变量的谓词,那么会将选择率相乘
  4. 如果不是前面说的四个操作符,则会将二者的选择率相乘

范围查询

同一变量

让我们先来试试第二个例子

postgres=# create table t1(id int,id2 int,id3 int);
CREATE TABLE
postgres=# insert into t1 values(generate_series(1,100000),generate_series(1,200000,2),generate_series(1,300000,3));
INSERT 0 100000
postgres=# analyze t1;
ANALYZE

统计信息如下

postgres=# select attname,null_frac,n_distinct,most_common_vals,most_common_freqs,histogram_bounds,correlation from pg_stats where tablename = 't1';
-[ RECORD 1 ]-----+--------------------------------------------------------------------------------------------------------------
attname           | id
null_frac         | 0
n_distinct        | -1
most_common_vals  | 
most_common_freqs | 
histogram_bounds  | {1,973,1981,2898,3914,4885,5931,6929,7880,8920,9849,10830,11827,12871,13878,14800,15745,16796,17809,18778,19826,20870,21869,22854,23882,24983,26048,26987,27921,28966,30012,30916,31873,32848,33788,34757,35811,36770,37850,38863,39922,40913,41895,42854,43850,44965,46079,47110,48064,49049,49946,50887,51883,52919,53888,54751,55777,56750,57745,58813,59826,60896,61841,62876,63870,64887,65906,66930,67912,68835,69856,70910,72004,73015,74028,74963,75996,77012,78050,79049,80039,81011,82052,83050,84038,85027,86029,86966,87929,88866,89948,90989,91954,92863,93895,94877,95890,96968,98001,98976,99995}
correlation       | 1
-[ RECORD 2 ]-----+--------------------------------------------------------------------------------------------------------------
attname           | id2
null_frac         | 0
n_distinct        | -1
most_common_vals  | 
most_common_freqs | 
histogram_bounds  | {19,2173,4305,6361,8459,10543,12603,14563,16629,18685,20711,22675,24579,26581,28425,30317,32259,34437,36421,38337,40419,42343,44399,46383,48391,50303,52153,54183,56351,58383,60227,62275,64129,66151,68157,70177,71997,73993,75931,77885,80033,82015,83947,86021,87955,89873,91915,93887,95867,97877,99953,101807,103777,105759,107665,109561,111691,113695,115791,117667,119689,121627,123805,125949,127929,130003,132061,134313,136421,138409,140327,142377,144473,146475,148527,150609,152511,154423,156411,158299,160157,162105,163985,165957,168017,169901,171995,173893,175983,177955,179897,181813,183945,185775,187879,189813,191807,193803,195759,197889,199999}
correlation       | 1
-[ RECORD 3 ]-----+--------------------------------------------------------------------------------------------------------------
attname           | id3
null_frac         | 0
n_distinct        | -1
most_common_vals  | 
most_common_freqs | 
histogram_bounds  | {1,2917,5941,8692,11740,14653,17791,20785,23638,26758,29545,32488,35479,38611,41632,44398,47233,50386,53425,56332,59476,62608,65605,68560,71644,74947,78142,80959,83761,86896,90034,92746,95617,98542,101362,104269,107431,110308,113548,116587,119764,122737,125683,128560,131548,134893,138235,141328,144190,147145,149836,152659,155647,158755,161662,164251,167329,170248,173233,176437,179476,182686,185521,188626,191608,194659,197716,200788,203734,206503,209566,212728,216010,219043,222082,224887,227986,231034,234148,237145,240115,243031,246154,249148,252112,255079,258085,260896,263785,266596,269842,272965,275860,278587,281683,284629,287668,290902,294001,296926,299983}
correlation       | 1

假设现在有一个SQL如下,explain select * from t1 where id > 500 and id < 900,那么按照前面的分析,他既满足范围操作符(大于小于),且是同一个变量,因此会走到范围查询的边界算法中

postgres=# explain select * from t1 where id > 500 and id < 900;
                       QUERY PLAN                       
--------------------------------------------------------
 Seq Scan on t1  (cost=0.00..2041.00 rows=410 width=12)
   Filter: ((id > 500) AND (id < 900))
(2 rows)

让我们试试看,以这个SQL为例

(gdb) b clauselist_selectivity_ext
Breakpoint 1 at 0x6ce780: file clausesel.c, line 125.
(gdb) info b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000006ce780 in clauselist_selectivity_ext at clausesel.c:125
(gdb) c
Continuing.

Breakpoint 1, clauselist_selectivity_ext (root=root@entry=0x256af98, clauses=0x256bb78, varRelid=varRelid@entry=0, 
    jointype=jointype@entry=JOIN_INNER, sjinfo=sjinfo@entry=0x0, use_extended_stats=use_extended_stats@entry=true)
    at clausesel.c:125
125     {
(gdb) n
137             if (list_length(clauses) == 1)   --是否只有一个字句
[ ... ]

169             foreach(l, clauses)   ---开始循环
(gdb) 
185                     s2 = clause_selectivity_ext(root, clause, varRelid, jointype, sjinfo,
(gdb) 
[ ... ]
(gdb) p s2
$1 = 0.99486138888888886

可以看到,第一个字句的选择率是0.99486138888888886,也就是select * from t1 where id > 500,我们简单计算一下,之前说过算法,selectivity = 1 - ((0 + (500 - bucket[0].min)/(bucket[0].max - bucket[0].min))/num_buckets ) ≈ 1 - 0.00513374 ≈ 0.99486626,和调试的结果一致。

继续往下,这一次的选择率是0.0082119444444444436

(gdb) 
169             foreach(l, clauses)    ---下一次循环

(gdb) 
185                     s2 = clause_selectivity_ext(root, clause, varRelid, jointype, sjinfo,
(gdb) 
194                     if (IsA(clause, RestrictInfo))
(gdb) p s2
$6 = 0.0092397222222222221

selectivity = ((0 + (900 - bucket[0].min)/(bucket[0].max - bucket[0].min))/num_buckets ) ≈ 0.00924897,和调试结果差不多。

继续往下,可以看到开始匹配范围查询

271             while (rqlist != NULL)  ---开始匹配范围查询
(gdb) 
275                     if (rqlist->have_lobound && rqlist->have_hibound)
(gdb) 
285                             if (rqlist->hibound == DEFAULT_INEQ_SEL ||
[ ... ]
(gdb) p *rqlist
$9 = {next = 0x0, var = 0x24e36f8, have_lobound = true, have_hibound = true, lobound = 0.99486138888888886, 
  hibound = 0.0092397222222222221}
[ ... ]
(gdb) n
292                                     s2 = rqlist->hibound + rqlist->lobound - 1.0;
(gdb) n
295                                     s2 += nulltestsel(root, IS_NULL, rqlist->var,
(gdb) p s2
$12 = 0.0041011111111111109

按照算法,s2 = rqlist->hibound + rqlist->lobound - 1.0, s2 = 0.0092397222222222221 + 0.99486138888888886 - 1 = 0.004101111111111,当然假如还有NULL的话,还要将NULL计算进去,s2 += nulltestsel(root, IS_NULL, rqlist->var

最后回到主函数,预估的rows是410,验证了我们的想法。

5276            rel->rows = clamp_row_est(nrows);
(gdb) 
5278            cost_qual_eval(&rel->baserestrictcost, rel->baserestrictinfo, root);
(gdb) p rel->rows
$14 = 410

不同变量

刚刚是针对同一个变量id的范围查询,再看看不同的变量,假设现在有个SQL如下 select id from t1 where id > 500 and id2 < 900; 这次多打一个断点到addRangeClause处。

(gdb) info b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000007afecf in clauselist_selectivity_ext at clausesel.c:126
2       breakpoint     keep y   0x00000000007b0691 in addRangeClause at clausesel.c:436


169             foreach(l, clauses)
(gdb) 
171                     Node       *clause = (Node *) lfirst(l);
(gdb) 
175                     listidx++;
(gdb) 
181                     if (bms_is_member(listidx, estimatedclauses))
(gdb) 
185                     s2 = clause_selectivity_ext(root, clause, varRelid, jointype, sjinfo,
(gdb) n
194                     if (IsA(clause, RestrictInfo))
(gdb) p s2
$2 = 0.99544487465181053

老样子,select id from t1 where id > 500的选择率是0.99544487465181053,然后走到了addRangeClause里面,第一个字句记录到了 rqelem

Breakpoint 2, addRangeClause (rqlist=0x7fff9399d768, clause=0x216a698, varonleft=true, isLTsel=false, 
    s2=0.99544487465181053) at clausesel.c:436
    
502             if (is_lobound)
(gdb) 
504                     rqelem->have_lobound = true;
(gdb) 
505                     rqelem->have_hibound = false;
(gdb) 
506                     rqelem->lobound = s2;
(gdb) p *rqelem
$5 = {next = 0x1000000e7, var = 0x216a5a8, have_lobound = true, have_hibound = false, lobound = 1.7277281961335222e-316, 
  hibound = 0.0082554317548746529}
(gdb) p s2
$6 = 0.99544487465181053

可以看到,have_lobound = true, have_hibound = false。

然后进入到第二个循环,第二个字句,即 select id from t1 where id2 < 900

(gdb) n
169             foreach(l, clauses)
(gdb) 
171                     Node       *clause = (Node *) lfirst(l);
(gdb) 
175                     listidx++;
(gdb) 
181                     if (bms_is_member(listidx, estimatedclauses))
(gdb) 
185                     s2 = clause_selectivity_ext(root, clause, varRelid, jointype, sjinfo,
(gdb) 
194                     if (IsA(clause, RestrictInfo))
(gdb) p s2
$9 = 0.0040859749303621172

这一次的选择率是0.0040859749303621172,老样子简单算一下 selectivity = ((0 + (900 - bucket[0].min)/(bucket[0].max - bucket[0].min))/num_buckets ) ≈ ((0 + (900 - 19)/(2173 - 19))/100 ) ≈ 0.00409006,结果差不多

236                             if (ok)
(gdb) 
243                                     switch (get_oprrest(expr->opno))
(gdb) 
247                                                     addRangeClause(&rqlist, clause,
(gdb) 

Breakpoint 2, addRangeClause (rqlist=0x7fff9399d768, clause=0x216a7d8, varonleft=true, isLTsel=true, 
    s2=0.0040859749303621172) at clausesel.c:436
436             if (varonleft)
(gdb) 
438                     var = get_leftop((Expr *) clause);
(gdb) 
439                     is_lobound = !isLTsel;  /* x < something is high bound */
(gdb) 
447             for (rqelem = *rqlist; rqelem; rqelem = rqelem->next)
(gdb) 
453                     if (!equal(var, rqelem->var))
(gdb) p equal(var, rqelem->var)
$10 = false
(gdb) n

可以看到,equal(var, rqelem->var) = false,两个字句变量不同,因此就跳出了循环

(gdb) p equal(var, rqelem->var)    ---不相等
$10 = false
(gdb) n
447             for (rqelem = *rqlist; rqelem; rqelem = rqelem->next)
(gdb) 
500             rqelem = (RangeQueryClause *) palloc(sizeof(RangeQueryClause));
(gdb) 
501             rqelem->var = var;
(gdb) 
502             if (is_lobound)
(gdb) 
510                     rqelem->have_lobound = false;
(gdb) 
511                     rqelem->have_hibound = true;
(gdb) 
512                     rqelem->hibound = s2;
(gdb) 
514             rqelem->next = *rqlist;
(gdb) p s2
$11 = 0.0040859749303621172

回到主函数之后,将二者的选择率相乘

  else   ---主函数
  {
   /* Only found one of a pair, merge it in generically */
   if (rqlist->have_lobound)
    s1 *= rqlist->lobound;
   else
    s1 *= rqlist->hibound;
  }
  /* release storage and advance */
  rqnext = rqlist->next;
  pfree(rqlist);
  rqlist = rqnext;
 }

 return s1;
169             foreach(l, clauses)
(gdb) 
271             while (rqlist != NULL)
(gdb) 
275                     if (rqlist->have_lobound && rqlist->have_hibound)
(gdb) 
332                             if (rqlist->have_lobound)
(gdb) 
335                                     s1 *= rqlist->hibound;
(gdb) 
338                     rqnext = rqlist->next;
(gdb) p s1
$12 = 0.0040859749303621172
(gdb) n
339                     pfree(rqlist);
(gdb) 
340                     rqlist = rqnext;
(gdb) 
271             while (rqlist != NULL)
(gdb) 
275                     if (rqlist->have_lobound && rqlist->have_hibound)
(gdb) 
332                             if (rqlist->have_lobound)
(gdb) 
333                                     s1 *= rqlist->lobound;
(gdb) 
338                     rqnext = rqlist->next;
(gdb) p s1
$13 = 0.0040673628023847582

选择率相乘,符合我们的结论

postgres=# select 0.99544487465181053 * 0.0040859749303621172;
                ?column?                
----------------------------------------
 0.004067362802384758015453277504054116
(1 row)

非大于小于操作符

再看看不是如下四个操作符的处理结果,假设SQL如下select id from t1 where id > 500 and id3 = 900;

#define F_SCALARLTSEL 103
#define F_SCALARLESEL 336
#define F_SCALARGTSEL 104
#define F_SCALARGESEL 337

postgres=# select oid,proname from pg_proc where oid in ('103','104','336','337');
 oid |   proname   
-----+-------------
 103 | scalarltsel   ---less than,小于
 104 | scalargtsel   ---greater than,大于
 336 | scalarlesel   ---less equal,小于等于
 337 | scalargesel   ---greate equal,大于等于
(4 rows)

id3 = 900的选择率是1.0000000000000001e-05,

185                     s2 = clause_selectivity_ext(root, clause, varRelid, jointype, sjinfo,
(gdb) 
194                     if (IsA(clause, RestrictInfo))
(gdb) 
196                             rinfo = (RestrictInfo *) clause;
(gdb) 
197                             if (rinfo->pseudoconstant)
(gdb) p s2
$17 = 1.0000000000000001e-05
(gdb) n
202                             clause = (Node *) rinfo->clause;
(gdb) 
213                     if (is_opclause(clause) && list_length(((OpExpr *) clause)->args) == 2)
(gdb) 
215                             OpExpr     *expr = (OpExpr *) clause;
(gdb) 
216                             bool            varonleft = true;
(gdb) 
219                             if (rinfo)
(gdb) 
221                                     ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
(gdb) 
222                                             (is_pseudo_constant_clause_relids(lsecond(expr->args),
(gdb) 
221                                     ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) &&
(gdb) 
236                             if (ok)
(gdb) 
243                                     switch (get_oprrest(expr->opno))   ---直接相乘,没有走到
(gdb) 
257                                                     s1 = s1 * s2;
(gdb) p get_oprrest(expr->opno)   ---操作符oid是101
$2 = 101
(gdb) 
258                                                     break;
(gdb) p s1
$18 = 1.0000000000000001e-05

第二个字句的操作符的oid是101,因此会将选择率相乘,预估的行数是1

postgres=# explain select id from t1 where id > 500 and id3 = 900;
                     QUERY PLAN                      
-----------------------------------------------------
 Seq Scan on t1  (cost=0.00..2041.00 rows=1 width=4)
   Filter: ((id > 500) AND (id3 = 900))
(2 rows)

postgres=# select 100000*1.0000000000000001e-05 as estimate_rows;
      estimate_rows      
-------------------------
 1.000000000000000100000
(1 row)

OR 谓词

OR的选择率要稍微复杂一点, s1 + s2 - s1 * s2P(A + B) = P(A) + P(B) - P(AB)

 foreach(lc, clauses)
 {
  Selectivity s2;

  listidx++;

  /*
   * Skip this clause if it's already been estimated by some other
   * statistics above.
   */

  if (bms_is_member(listidx, estimatedclauses))
   continue;

  s2 = clause_selectivity_ext(root, (Node *) lfirst(lc), varRelid,
         jointype, sjinfo, use_extended_stats);

  s1 = s1 + s2 - s1 * s2;
 }

 return s1;

比如刚刚的SQL换成OR的话,按照公式那么选择率 = 0.99544487465181053 + 0.0040859749303621172 - 0.99544487465181053 * 0.0040859749303621172

postgres=# explain select id from t1 where id > 500 or id2 < 900;
                       QUERY PLAN                        
---------------------------------------------------------
 Seq Scan on t1  (cost=0.00..2041.00 rows=99546 width=4)
   Filter: ((id > 500) OR (id2 < 900))
(2 rows)

postgres=# select 0.99544487465181053 + 0.0040859749303621172 - 0.99544487465181053 * 0.0040859749303621172;
                ?column?                
----------------------------------------
 0.995463486779787889184546722495945884
(1 row)

postgres=# select (0.99544487465181053 + 0.0040859749303621172 - 0.99544487465181053 * 0.0040859749303621172) * 100000;
                  ?column?                  
--------------------------------------------
 99546.348677978788918454672249594588400000
(1 row)

预估行数正确,为99546条。

NOT 谓词

选择率是 1 - s1,这个很好理解,非此即彼。

 else if (is_notclause(clause))
 {
  /* inverse of the selectivity of the underlying clause */
  s1 = 1.0 - clause_selectivity_ext(root,
            (Node *) get_notclausearg((Expr *) clause),
            varRelid,
            jointype,
            sjinfo,
            use_extended_stats);
 }

AND 谓词

AND的话很粗暴,直接相乘。P(AB) = P(A) x P(B)。详见后文的扩展统计信息

BOOLEAN 谓词

对于布尔类型,高颇值数组中最多有两个元组, true和false,如果数组中的第一个是true,那么true的比例就是数组中的值,freq_true,如果第一个是false,那么就是freq_false

  1. is null:pg_stats.null_frac
  2. is not null:1 - pg_stats.null_frac
  3. is true:如果没有高频值,等于 (1 - freq_null) / 2.0
  4. is not true:如果没有高频值,等于 (1 + freq_null) / 2.0
  5. is false:如果没有高频值,等于 (1 - freq_null) / 2.0
  6. is not false:如果没有高频值,等于 (1 + freq_null) / 2.0

函数选择性

选择率在function_selectivity()中,会读取pg_proc系统表

/*
  * function_selectivity
  *
  * Returns the selectivity of a specified boolean function clause.
  * This code executes registered procedures stored in the
  * pg_proc relation, by calling the function manager.
  *
  * See clause_selectivity() for the meaning of the additional parameters.
  */

 Selectivity
 function_selectivity(PlannerInfo *root,
                      Oid funcid,
                      List *args,
                      Oid inputcollid,
                      bool is_join,
                      int varRelid,
                      JoinType jointype,
                      SpecialJoinInfo *sjinfo)
 
{
postgres=# select proname, prosupport, prorows 
           from pg_proc 
           where proname like '%generate%';
           proname            |          prosupport          | prorows 
------------------------------+------------------------------+---------
 generate_subscripts          | -                            |    1000
 generate_subscripts          | -                            |    1000
 generate_series              | generate_series_int4_support |    1000
 generate_series              | generate_series_int4_support |    1000
 generate_series_int4_support | -                            |       0
 generate_series              | generate_series_int8_support |    1000
 generate_series              | generate_series_int8_support |    1000
 generate_series_int8_support | -                            |       0
 generate_series              | -                            |    1000
 generate_series              | -                            |    1000
 generate_series              | -                            |    1000
 generate_series              | -                            |    1111
(12 rows)

举个栗子

postgres=# alter function generate_series(timestamp without time zone, timestamp without time zone, interval) 
postgres-#     rows 14;
ALTER FUNCTION
postgres=# explain                                                                                                         
                                        
select * from generate_series(
            (CURRENT_DATE)::timestamp without time zone,
            (CURRENT_DATE + '14 days'::interval),
            '1 day'::interval);    ---预估14条
                             QUERY PLAN                              
---------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.01..0.15 rows=14 width=8)
(1 row)

postgres=# alter function generate_series(timestamp without time zone, timestamp without time zone, interval)     rows 1111;
ALTER FUNCTION
postgres=# explain                                                                                                         
                                        
select * from generate_series(
            (CURRENT_DATE)::timestamp without time zone,
            (CURRENT_DATE + '14 days'::interval),
            '1 day'::interval);    ---预估1111条
                               QUERY PLAN                               
------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.01..11.12 rows=1111 width=8)
(1 row)

扩展统计信息

Multivariate N-Distinct Counts

前面展示了多列选择率的计算,可以看到,PostgreSQL会简单地将选择率相乘,P(AB) = P(A) x P(B),但是实际情况可能各个列之间存在依赖关系,正如注释中所说

The basic approach is to apply extended statistics first, on as many clauses as possible, in order to capture cross-column dependencies etc.The remaining clauses are then estimated by taking the product of their selectivities, but that's only right if they have independent probabilities, and in reality they are often NOT independent even if they only refer to a single column.  So, we want to be smarter where we can.

举个栗子

postgres=# create table t2 (a int,b int,c text);
create table
postgres=# insert into t2 select i/10000, i/100000 , repeat(' ', 100)  from generate_series (1,1000000) s(i);
insert 0 1000000
postgres=# analyze t2;
analyze
postgres=# select attname,null_frac,n_distinct,most_common_vals,most_common_freqs,histogram_bounds,correlation from pg_stats where tablename = 't2' and attname in ('a','b');
-[ RECORD 1 ]-----+--------------------------------------------------------------------------------------------------------------
attname           | a
null_frac         | 0
n_distinct        | 100
most_common_vals  | {74,16,41,92,73,69,76,2,31,78,7,49,82,14,37,80,13,28,55,56,75,98,23,65,85,63,29,32,9,21,50,4,71,17,68,26,38,19,27,5,52,53,95,8,43,40,47,61,83,97,42,48,79,45,94,46,77,99,96,6,91,54,58,62,64,1,34,84,39,66,15,20,11,22,67,57,86,93,44,88,3,59,24,36,90,60,25,33,70,72,81,18,51,10,35,0,87,89,12,30}
most_common_freqs | {0.011366666,0.011266666,0.011033333,0.011,0.010966667,0.010933333,0.010933333,0.0108,0.0108,0.0108,0.010766666,0.010766666,0.010766666,0.010733333,0.010666667,0.010666667,0.0106,0.010566667,0.010566667,0.010566667,0.010566667,0.010533334,0.0105,0.0105,0.0105,0.010466667,0.010433333,0.0104,0.010366667,0.010366667,0.010366667,0.0103,0.0103,0.010266666,0.010266666,0.010233333,0.010233333,0.0102,0.0102,0.0101666665,0.0101333335,0.0101333335,0.0101333335,0.0101,0.0101,0.010066667,0.010066667,0.010066667,0.010066667,0.010066667,0.010033334,0.010033334,0.01,0.009966667,0.009966667,0.009933333,0.009933333,0.009933333,0.0099,0.009866667,0.009866667,0.009833333,0.009833333,0.009833333,0.009833333,0.0098,0.0098,0.0098,0.009766666,0.009766666,0.009733333,0.0097,0.009633333,0.0096,0.009566667,0.009533334,0.009533334,0.009533334,0.0095,0.009466667,0.009433334,0.009433334,0.009366667,0.009366667,0.009333333,0.009266667,0.009233333,0.009233333,0.009233333,0.0092,0.009166666,0.009133333,0.009133333,0.0090333335,0.0090333335,0.009,0.008833333,0.008833333,0.008533333,0.008033333}
histogram_bounds  | 
correlation       | 1
-[ RECORD 2 ]-----+--------------------------------------------------------------------------------------------------------------
attname           | b
null_frac         | 0
n_distinct        | 10
most_common_vals  | {7,4,0,6,9,2,5,1,8,3}
most_common_freqs | {0.1033,0.1015,0.1006,0.1005,0.100266665,0.1002,0.099533334,0.099133335,0.09763333,0.097333334}
histogram_bounds  | 
correlation       | 1

查询一下,可以看到这个SQL预估返回986行,但是实际结果却返回了10000行,相差甚远。原因前面也阐述了,估算多个字段时默认使用独立属性,直接以多个字段选择率相乘的方法计算多个字段条件的选择率,因此对于这种实际有依赖关系的不是很准确。a = 1的选择率是 selectivity = (1 - sum(mvf))/(num_distinct - num_mcv) = 0.0098000001162290573b = 0,刚好在高频值里面,所以选择率是0.1006,二者相乘得到预估的行数是986

postgres=# explain analyze select * from t2 where a = 1 and b = 0;
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Seq Scan on t2  (cost=0.00..41010.00 rows=986 width=109) (actual time=45.125..233.511 rows=10000 loops=1)
   Filter: ((a = 1) AND (b = 0))
   Rows Removed by Filter: 990000
 Planning Time: 0.074 ms()
 Execution Time: 234.549 ms
(5 rows)

postgres=# select 0.0098000001162290573 * 0.1006 * 1000000 as estimate_rows;
        estimate_rows        
-----------------------------
 985.88001169264316438000000
(1 row)

但是实际情况是A列和B列是有依赖关系的,B列的取值只有A列的1/10,A列的值可以确定B列的值。因此我们可以通过一种"方式"告诉数据库两列之间的关系,这就是扩展统计信息。

postgres=# create statistics mystatic1 (dependencies) on a,b from t2;
CREATE STATISTICS
postgres=# analyze t2;
ANALYZE
postgres=# \d t2
                 Table "public.t2"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | integer |           |          | 
 c      | text    |           |          | 
Statistics objects:
    "public.mystatic1" (dependencies) ON a, b FROM t2

通过扩展统计信息,数据库便知晓了各个列之间的关联,不能再简单相乘了。

postgres=# select * from pg_statistic_ext;
-[ RECORD 1 ]-+----------
oid           | 16402      ---表的oid
stxrelid      | 16394      ---扩展统计信息的oid
stxname       | mystatic1  ---扩展统计信息的名字
stxnamespace  | 2200       ---模式名
stxowner      | 10         ---属主
stxstattarget | -1         ---stxstattarget controls the level of detail of statistics accumulated for this statistics object by ANALYZE. A zero value indicates that no statistics should be collected. A negative value says to use the maximum of the statistics targets of the referenced columns, if set, or the system default statistics target. Positive values of stxstattarget determine the target number of “most common values” to collect.
stxkeys       | 1 2        ---基于哪几列创建的扩展统计信息
stxkind       | {f}        ---d for n-distinct statistics, f for functional dependency statistics, m for most common values (MCV) list statistics, and e for expression statistics
stxexprs      | 

postgres=# SELECT s.stxrelid::regclass AS table_name,
       s.stxname AS statistics_name,
       d.stxdndistinct AS ndistinct,
       d.stxddependencies AS dependencies
FROM pg_statistic_ext AS s
   JOIN pg_statistic_ext_data AS d
      ON d.stxoid = s.oid;
 table_name | statistics_name | ndistinct |     dependencies     
------------+-----------------+-----------+----------------------
 t2         | mystatic1       |           | {"1 => 2": 1.000000}
(1 row)

postgres=# select * from pg_statistic_ext_data ;
-[ RECORD 1 ]----+---------------------
stxoid           | 16402
stxdndistinct    | 
stxddependencies | {"1 => 2": 1.000000}
stxdmcv          | 
stxdexpr         | 

stxddependencies:当一个字段确定后,另一个字段是唯一值的比例,可以用于评估两个字段都是等值条件时的选择性。列与列的依赖强度,是一个小于等于1的系数,1表示强依赖。

那这个行数是如何计算出来的呢?新建一个表从新计算一下

postgres=# create table tbl(id int, c1 int, c2 text, c3 int, c4 int);
CREATE TABLE
postgres=# insert into tbl select id,random()*100,substring(md5(random()::text), 1, 4),random()*900,random()*10000 from generate_series(1,10000000) as id;
INSERT 0 10000000
postgres=# create statistics s2 on c1,c2,c3 from tbl;
CREATE STATISTICS
postgres=# analyze tbl ;
ANALYZE
postgres=# select * from pg_statistic_ext where stxname = 's2';
-[ RECORD 1 ]-+--------
oid           | 16697
stxrelid      | 16403
stxname       | s2
stxnamespace  | 2200
stxowner      | 10
stxstattarget | -1
stxkeys       | 2 3 4
stxkind       | {d,f,m}
stxexprs     

postgres=# select * from pg_statistic_ext_data where stxoid = 16697;
-[ RECORD 1 ]----+--------------------------------------------------------------------------------------------------------------
stxoid           | 16697
stxdndistinct    | {"2, 3"4016110"2, 4"90950"3, 4"8037579"2, 3, 4"10000115}
stxddependencies | {"3 => 2"0.638900"3 => 4"0.636233"2, 3 => 4"0.995533"2, 4 => 3"0.720867"3, 4 => 2"0.999267}
stxdmcv          | 
stxdexpr         | 

可以看到此次预估的行数是97行

postgres=# explain select * from tbl where c1=1 and c2='abc';
                        QUERY PLAN                        
----------------------------------------------------------
 Seq Scan on tbl  (cost=0.00..213696.73 rows=97 width=21)
   Filter: ((c1 = 1) AND (c2 = 'abc'::text))
(2 rows)

postgres=# select reltuples::numeric from pg_class where relname = 'tbl';
 reltuples 
-----------
  10000100
(1 row)

那么这个97行是如何计算出来的呢?公式是P( A & B ) = P(A) * [ f + ( 1 - f ) * P(B) ]

f是依赖度,此例是0.638900,P(A)P(B)的选择率分别等于👇🏻

postgres=# explain select * from tbl where c2='abc';
                        QUERY PLAN                         
-----------------------------------------------------------
 Seq Scan on tbl  (cost=0.00..188696.44 rows=152 width=21)
   Filter: (c2 = 'abc'::text)
(2 rows)

postgres=# explain select * from tbl where c1=1;
                         QUERY PLAN                          
-------------------------------------------------------------
 Seq Scan on tbl  (cost=0.00..188696.44 rows=93334 width=21)
   Filter: (c1 = 1)
(2 rows)

postgres=# select 152/10000100::numeric;    ---P(A)
          ?column?          
----------------------------
 0.000015199848001519984800
(1 row)

postgres=# select 93334/10000100::numeric;   ---P(B)
        ?column?        
------------------------
 0.00933330666693333067
(1 row)

那么预估的行数 rows 按照公式计算如下

postgres=# select 93334/10000100::numeric*0.3611+0.6389*0.000015199848001519984800*10000100;
            ?column?             
---------------------------------
 97.1161702570374296247338090000
(1 row)

限制项

参照官网,函数依赖有如下限制

Functional dependencies are currently only applied when considering simple equality conditions that compare columns to constant values, and IN clauses with constant values. They are not used to improve estimates for equality conditions comparing two columns or comparing a column to an expression, nor for range clauses, LIKE or any other type of condition.

目前仅在考虑将列与常量进行比较的简单相等条件以及具有常量的 IN 子句时才应用函数依赖关系。它们不用于改进比较两列或比较列与表达式的相等条件的估算,也不用于范围子句、LIKE 或任何其他类型的条件。

When estimating with functional dependencies, the planner assumes that conditions on the involved columns are compatible and hence redundant. If they are incompatible, the correct estimate would be zero rows, but that possibility is not considered. For example, given a query like

当使用函数依赖性进行估计时,优化器假设所涉及的列上的条件是兼容的,因此是冗余的。如果它们不兼容,那么正确的估计应该是零行,但是不考虑这种可能性。

目前只适用于简单的等值比较,比如不等于就不支持

当然函数依赖也得包括创建时包括的列

Multivariate N-Distinct Counts

类似的,还有分组聚合,比如group by col1,col2的时候,也可以创建扩展统计信息,还有针对多列的高频值,比如

postgres=# create table t(id int,id2 int);
CREATE TABLE
postgres=# insert into t select id/10000, id/100000 from generate_series(1,10000000) as id;
INSERT 0 10000000
postgres=# analyze t;
ANALYZE
postgres=# explain analyze select count(*) from t group by id,id2;    ---预估行数100000行,但是实际只有1001
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=731751.30..810876.67 rows=100000 width=16) (actual time=7994.837..7995.648 rows=1001 loops=1)
   Group Keyid, id2
   Planned Partitions4  Batches: 1  Memory Usage1681kB
   ->  Seq Scan on t  (cost=0.00..144248.48 rows=10000048 width=8) (actual time=0.219..1978.499 rows=10000000 loops=1)
 Planning Time0.136 ms
 Execution Time7996.459 ms
(6 rows)

postgres=# create statistics s3 (ndistinct) on id,id2 from t;
CREATE STATISTICS
postgres=# analyze t;
ANALYZE
postgres=# select stxdndistinct,stxddependencies from pg_statistic_ext_data where stxoid = 16705;
 stxdndistinct  | stxddependencies 
----------------+------------------
 {"1, 2"1000} | 
(1 row)

postgres=# explain analyze select count(*) from t group by id,id2;    
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=219247.60..219257.60 rows=1000 width=16) (actual time=7620.880..7621.301 rows=1001 loops=1)
   Group Keyid, id2
   Batches: 1  Memory Usage193kB
   ->  Seq Scan on t  (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.025..1861.774 rows=10000000 loops=1)
 Planning Time0.189 ms
 Execution Time7621.486 ms
(6 rows)

stxdndistinct 很好理解,字段组合后的唯一值,此例是1000个,创建了扩展统计信息之后,执行计划预估的rows直接使用了该值。

Multivariate MCV Lists

针对多列创建统计信息,12版本引入。顾名思义,看个栗子

其他

另外在14里面扩展统计信息继续增强,支持收集指定表达式的统计信息,增强优化器在评估这些表达式时统计信息的准确性.

create table t (a int);  
create statistics s on mod(a,10), mod(a,20from t;  
analyze t;  
select * from t where mod(a,10) = 0 and mod(a,20) = 0;  
select 1 from t group by mod(a,10), mod(a,20);  

更多信息可以参考文档 https://www.postgresql.org/docs/15/multivariate-statistics-examples.html

默认选择率

在PostgreSQL中,除了根据统计信息计算出来的选择率之外,还有一些默认选择率,定义在selfunc.h

// 默认的选择率评估并不是完全随机选择的,我们希望他们尽可能得小,以在可行的时候采用索引扫描
// 用于大概 100行/页 的常见表密度的场景,因此,比如0.01不够小,因为这看起来几乎所有的页面
// 都会被命中。并且,有时我们评估eqsel为1/num_distinct,我们可能希望DEFAULT_NUM_DISTINCT
// 等于 1/DEFAULT_EQ_SEL。

/*
 * Note: the default selectivity estimates are not chosen entirely at random.
 * We want them to be small enough to ensure that indexscans will be used if
 * available, for typical table densities of ~100 tuples/page.  Thus, for
 * example, 0.01 is not quite small enough, since that makes it appear that
 * nearly all pages will be hit anyway.  Also, since we sometimes estimate
 * eqsel as 1/num_distinct, we probably want DEFAULT_NUM_DISTINCT to equal
 * 1/DEFAULT_EQ_SEL.
 */


/* default selectivity estimate for equalities such as "A = b" */
#define DEFAULT_EQ_SEL 0.005

/* default selectivity estimate for inequalities such as "A < b" */
#define DEFAULT_INEQ_SEL  0.3333333333333333

/* default selectivity estimate for range inequalities "A > b AND A < c" */
#define DEFAULT_RANGE_INEQ_SEL 0.005

/* default selectivity estimate for multirange inequalities "A > b AND A < c" */
#define DEFAULT_MULTIRANGE_INEQ_SEL 0.005

/* default selectivity estimate for pattern-match operators such as LIKE */
#define DEFAULT_MATCH_SEL 0.005

/* default selectivity estimate for other matching operators */
#define DEFAULT_MATCHING_SEL 0.010

/* default number of distinct values in a table */
#define DEFAULT_NUM_DISTINCT  200

/* default selectivity estimate for boolean and null test nodes */
#define DEFAULT_UNK_SEL   0.005
#define DEFAULT_NOT_UNK_SEL  (1.0 - DEFAULT_UNK_SEL)

举个栗子,我将谓词换成 select 500了之后,预估的行数是33333,因为默认的"A < b"选择率是0.3333333,因此预估的行数便是33333。

postgres=# explain select id from test where id < 500;
                       QUERY PLAN                        
---------------------------------------------------------
 Seq Scan on test  (cost=0.00..1693.00 rows=530 width=4)
   Filter: (id < 500)
(2 rows)

postgres=# explain select id from test where id < (select 500);   ---默认选择率 DEFAULT_INEQ_SEL  0.3333333333333333
                        QUERY PLAN                         
-----------------------------------------------------------
 Seq Scan on test  (cost=0.01..1693.01 rows=33333 width=4)
   Filter: (id < $0)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=4)
(4 rows)

postgres=# explain select id from t1 where id > (select 500) and id < (select 900);   ---默认选择率 DEFAULT_MULTIRANGE_INEQ_SEL 0.005
                      QUERY PLAN                       
-------------------------------------------------------
 Seq Scan on t1  (cost=0.02..2041.02 rows=500 width=4)
   Filter: ((id > $0) AND (id < $1))
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=4)
   InitPlan 2 (returns $1)
     ->  Result  (cost=0.00..0.01 rows=1 width=4)
(6 rows)

当没有统计信息的时候,也会使用默认选择率。举个栗子

postgres=# create table test(id int) with (autovacuum_enabled = off);
CREATE TABLE

这样的话就不会收集统计信息了,但是对于pg_class中的reltuples和relpages,一些DDL命令是会收集的,比如create index,因为创建索引需要排序数据,需要知道行数和块数。

postgres=# select reltuples,relpages from pg_class where relname = 'test';
 reltuples | relpages 
-----------+----------
        -1 |        0
(1 row)

postgres=# create index ON test(id);
CREATE INDEX
postgres=# select reltuples,relpages from pg_class where relname = 'test';    ---更新了信息
 reltuples | relpages 
-----------+----------
    100000 |      443
(1 row)

postgres=# select * from pg_stats where tablename = 'test';
 schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs |
 histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram 
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+
------------------+-------------+-------------------+------------------------+----------------------
(0 rows)

postgres=# explain select * from test where id is null;
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=12.17..481.25 rows=500 width=4)
   Recheck Cond: (id IS NULL)
   ->  Bitmap Index Scan on test_id_idx  (cost=0.00..12.04 rows=500 width=0)
         Index Cond: (id IS NULL)
(4 rows)

postgres=# explain select * from test where id is not null;
                        QUERY PLAN                         
-----------------------------------------------------------
 Seq Scan on test  (cost=0.00..1443.00 rows=99500 width=4)
   Filter: (id IS NOT NULL)
(2 rows)

可以看到is null使用了默认选择率,0.005,只有行数和块数,并没有统计信息。同理其他操作也会使用默认选择率

postgres=# explain select id from test where id < 500 ;   ---默认选择率,DEFAULT_INEQ_SEL  0.3333333333333333
                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=626.62..1486.29 rows=33333 width=4)
   Recheck Cond: (id < 500)
   ->  Bitmap Index Scan on test_id_idx  (cost=0.00..618.29 rows=33333 width=0)
         Index Cond: (id < 500)
(4 rows)

postgres=# explain select id from test where id = 500 ;   ---默认选择率,DEFAULT_EQ_SEL 0.005
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=12.17..482.50 rows=500 width=4)
   Recheck Cond: (id = 500)
   ->  Bitmap Index Scan on test_id_idx  (cost=0.00..12.04 rows=500 width=0)
         Index Cond: (id = 500)
(4 rows)

PEV

对于冗长的SQL,执行计划可能满满一屏幕都看不完,人肉分析费时费力,因此我们需要借助一些工具将执行计划可视化一下,这就是PEV,一目了然,可以迅速发现高消耗节点,着重优化这些高消耗节点。

据我所知,总共有三个PEV工具,常用的是前两个,可以参考以前的文章 伪SQL优化大师速成法,上手很快。

  1. https://explain.depesz.com/
  2. https://explain.dalibo.com/plan#
  3. https://tatiyants.com/pev/#/plans/new

痛点

10版本引入了黑科技—扩展统计信息,让优化器的能力更上一层楼。但是对于多值列,以及一些特殊的类型或操作符条件,过滤性的评估依旧有改进空间。

比如JSON类型的操作,范围类型的操作,空间类型的评估等,因为代码里面选择率是写死的,对于大表可能做出错误的执行计划。

举个栗子,我们可以通过row_to_json构造一个JSON表,当然还有很多其他技巧,可以参照 https://www.crunchydata.com/blog/generating-json-directly-from-postgres。先创建一个普通的表

postgres=# create table test(col1 int,col2 text,col3 text,col4 text);
CREATE TABLE
postgres=# insert into test select random()*1000,getEnChar(1,10),md5(random()::text),getEnChar(1,30) from generate_series(1,10000000);
INSERT 0 10000000

然后转化成JSON

postgres=# create table json_t(info json);
CREATE TABLE
postgres=# insert into json_t select row_to_json(test) from test;
INSERT 0 10000000
postgres=# analyze json_t ;
ANALYZE
postgres=# select * from pg_stats where tablename = 'json_t';
-[ RECORD 1 ]----------+-------
schemaname             | public
tablename              | json_t
attname                | info
inherited              | f
null_frac              | 0
avg_width              | 95
n_distinct             | 0
most_common_vals       | 
most_common_freqs      | 
histogram_bounds       | 
correlation            | 
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 

可以看到,json是没有高频值和直方图的,因此执行计划完全靠天意,如下就走的默认选择率DEFAULT_EQ_SELDEFAULT_INEQ_SEL

postgres=# explain select * from json_t where info->>'col1' = '867';
                           QUERY PLAN                           
----------------------------------------------------------------
 Seq Scan on json_t  (cost=0.00..307107.00 rows=50000 width=95)
   Filter: ((info ->> 'col1'::text) = '867'::text)
(2 rows)

postgres=# explain select * from json_t where info->>'col1' < '867';
                            QUERY PLAN                            
------------------------------------------------------------------
 Seq Scan on json_t  (cost=0.00..307107.00 rows=3333333 width=95)
   Filter: ((info ->> 'col1'::text) < '867'::text)
(2 rows)

那转化成JSONB对象看一下(这个过程会重写)再次ANALYZE

postgres=# select unnest(histogram_bounds::text::json[]) from pg_stats where tablename = 'json_t' limit 5;
                                                     unnest                                                      
-----------------------------------------------------------------------------------------------------------------
 {"col1": 0, "col2": "CJ", "col3": "57fb94b7a2f14f8a2a7a5756ebbf6440", "col4": "uD"}
 {"col1": 10, "col2": "sKyRDc", "col3": "f05581e68a42b5065ce465feff286ffa", "col4": "yEiisKWygsxYMoB"}
 {"col1": 21, "col2": "hCb", "col3": "fcb3634fceb99138f9efee7047c5cfe1", "col4": "yGTzXfYVweyRJyLDRlINIlkjBLvh"}
 {"col1": 30, "col2": "LOCbqPqG", "col3": "d15c40bc61c90544455a661e7c607d4e", "col4": "dVJeXEuRs"}
 {"col1": 40, "col2": "TdO", "col3": "8584a30a76d4d48a3e8f211c79281ced", "col4": "WcoQWeyUjZYcHwxBafMJodypv"}
(5 rows)

postgres=# select null_frac,avg_width,n_distinct,correlation from pg_stats where tablename = 'json_t';
 null_frac | avg_width | n_distinct | correlation  
-----------+-----------+------------+--------------
         0 |       114 |         -1 | 0.0024379073
(1 row)

可以看到JSONB不仅有常规的null_fraction、distinct,还有直方图。

针对JSONB有一些专门的操作符

OperatorDescription
@>Does the left JSON value contain the right JSON path/value entries at the top level?
<@Are the left JSON path/value entries contained at the top level within the right JSON value?

看个例子

postgres=# explain analyze select * from json_t where info @>  '{"col1":867}'::jsonb;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Seq Scan on json_t  (cost=0.00..304871.71 rows=1000 width=114) (actual time=2.115..9065.463 rows=9780 loops=1)
   Filter: (info @> '{"col1": 867}'::jsonb)
   Rows Removed by Filter: 9990220
 Planning Time: 0.146 ms
 Execution Time: 9070.976 ms
(5 rows)

这个选择率1000是如何计算的?可以看到JSONB使用的是matchingsel

postgres=# select oprname,typname,oprrest from pg_operator op
    join pg_type typ ON op.oprleft= typ.oid where oprname = '@>';
 oprname |    typname    |    oprrest    
---------+---------------+---------------
 @>      | path          | -
 @>      | box           | contsel
 @>      | box           | contsel
 @>      | polygon       | contsel
 @>      | polygon       | contsel
 @>      | circle        | contsel
 @>      | circle        | contsel
 @>      | _aclitem      | -
 @>      | anyarray      | arraycontsel
 @>      | tsquery       | matchingsel
 @>      | jsonb         | matchingsel
 @>      | anyrange      | rangesel
 @>      | anyrange      | multirangesel
 @>      | anyrange      | rangesel
 @>      | anymultirange | multirangesel
 @>      | anymultirange | multirangesel
 @>      | anymultirange | multirangesel
(17 rows)
/*
 * matchingsel -- generic matching-operator selectivity support
 *
 * Use these for any operators that (a) are on data types for which we collect
 * standard statistics, and (b) have behavior for which the default estimate
 * (twice DEFAULT_EQ_SEL) is sane.  Typically that is good for match-like
 * operators.
 */


Datum
matchingsel(PG_FUNCTION_ARGS)
{
 PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
 Oid   operator = PG_GETARG_OID(1);
 List    *args = (List *) PG_GETARG_POINTER(2);
 int   varRelid = PG_GETARG_INT32(3);
 Oid   collation = PG_GET_COLLATION();
 double  selec;

 /* Use generic restriction selectivity logic. */
 selec = generic_restriction_selectivity(root, operator, collation,
           args, varRelid,
           DEFAULT_MATCHING_SEL);

 PG_RETURN_FLOAT8((float8) selec);
}

使用的是generic_restriction_selectivity()进行选择率估算

(gdb) 
964                     selec = histogram_selectivity(&vardata, &opproc, collation,
(gdb) 
967                     if (selec < 0)
(gdb) 
972                     else if (hist_size < 100)
(gdb) 
986                     if (selec < 0.0001)
(gdb) 
987                             selec = 0.0001;
(gdb) p selec < 0.0001
$6 = 1
(gdb) p selec
$7 = 0
(gdb) n
992                     if (HeapTupleIsValid(vardata.statsTuple))
(gdb) p selec
$8 = 0.0001
(gdb) n
993                             nullfrac = ((Form_pg_statistic) GETSTRUCT(vardata.statsTuple))->stanullfrac;
(gdb) 
1002                    selec *= 1.0 - nullfrac - mcvsum;

在这里估算出的0.0001,因此rows = reltuples * selectivity = 1000。

postgres=# explain analyze select * from json_t where info @>  '{"col1":867}'::jsonb;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Seq Scan on json_t  (cost=0.00..304863.40 rows=1000 width=114) (actual time=0.174..6088.345 rows=9780 loops=1)
   Filter: (info @> '{"col1": 867}'::jsonb)
   Rows Removed by Filter: 9990220
 Planning Time: 0.078 ms
 Execution Time: 6089.898 ms
(5 rows)

不过距离实际的行数还是有差距,实际返回了9780行。

小结

以上就是关于执行计划的种种了,后面一期我们再来聊一聊执行计划中的各个算子和扫描方式。

参考

Statistics In PostgreSQL

https://vsevolod.net/postgresql-jsonb-index/

https://www.postgresql.org/docs/current/row-estimation-examples.html

https://www.crunchydata.com/blog/indexes-selectivity-and-statistics

https://rjuju.github.io/postgresql/2020/02/28/pg_qualstats-2-selectivity-error.html

https://github.com/digoal/blog/blob/master/201807/20180711_02.md

https://blog.anayrat.info/en/2017/11/26/postgresql-jsonb-and-statistics/

http://blog.itpub.net/6906/viewspace-2374857

https://www.devart.com/dbforge/postgresql/studio/postgresql-explain.html

唠唠

PG学徒交流群1已经400人了,可能很多后面新关注的小伙伴不知道有这么一个群聊存在,每天群里吹牛唠嗑,交流技术激情碰撞,十分热闹。所以我准备再拉一个2群,一来当做standby,二来也给后面的小伙伴提供一个吹牛的平台 ~ welcome!



您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存