关于分区表的方方面面
什么是分区表
将逻辑上的一个大表拆分成多个更小的物理分片,以此获得性能上的提升。
为什么要分区
Item | Description |
最大单个数据库大小 | 不限 |
最大数据单表大小 | 32 TB |
单条记录最大 | 1.6 TB |
单字段最大允许 | 1 GB |
单表允许最大记录数 | 不限 |
单表最大字段数 | 250 - 1600 (取决于字段类型) |
单表最大索引数 | 不限 |
Item | Upper Limit | Comment |
database size | unlimited | |
number of databases | 4,294,950,911 | |
relations per database | 1,431,650,303 | |
relation size | 32 TB | with the default BLCKSZ of 8192 bytes |
rows per table | limited by the number of tuples that can fit onto 4,294,967,295 pages | |
columns per table | 1600 | further limited by tuple size fitting on a single page; see note below |
field size | 1 GB | |
identifier length | 63 bytes | can be increased by recompiling PostgreSQL |
indexes per table | unlimited | constrained by maximum relations per database |
columns per index | 32 | can be increased by recompiling PostgreSQL |
partition keys | 32 | can be increased by recompiling PostgreSQL |
PostgreSQL的单表最大允许32TB(默认值8KB的情况下),单表太大会引入很多问题(当然这里是基于默认的情况,:
1.表越大,索引创建的时间越久 (create index concurrently,慢,而且需要2遍以上的扫描,还可能留下invalid的索引)2.垃圾清理,单表的垃圾回收vacuum (ShareUpdateExclusive锁,自斥) 目前只支持串行,所以单表越大,垃圾回收的时间越长。笔者遇到很多次几个TB的单表跑了几天的autovacuum,还没有完 - -3.年龄回收,和单表串行vacuum类似,表越大,扫描的越慢,对于9.6以前的版本更加恶劣 (vm文件还没有引入all_frozen的标记位,冻结过的也要扫描)4.一个逻辑上的大表,可能占满文件系统,使用分区表之后可以将不同的表放置在不同的物理空间上,从而达到冷数据放在廉价的物理机器上,热点数据放置在性能强劲的机器上。
当然还有一些其他的限制,比如(参考源码定义,
•标识符长度:63•单表上的索引数量:无限制•单个索引的列个数:32•单数据库下对象的数量:1,431,650,303•函数最多可以用的参数个数:#define INDEX_MAX_KEYS 32•一个索引可以允许的最多列个数:#define PARTITION_MAX_KEYS 32•名区表允行的分区列数:#define NUM_SPINLOCK_SEMAPHORES 128•元组的列个数:#define MaxHeapAttributeNumber 1600 /* 8 * 200 */
分区的好处
1.拆分成一个个子表,那么就可以实现逻辑意义上的"并行"vacuum,多个vacuum进程可以同时作用于多个子表,包括年龄冻结、死元组回收、创建索引等维护性动作2.通过分区,可以实现类似冷热分离的效果,对于不常访问的子表,可以将其放在一般的媒介上面,比如SATA,对于频繁访问的热表,可以放在SSD上3.批量的加载和删除可以用删除或者detach子表实现,还可以避免大量删除导致的vacuum,源自官网:Bulk loads and deletes can be accomplished by adding or removing partitions, if the usage pattern is accounted for in the partitioning design. Dropping an individual partition using DROP TABLE, or doing ALTER TABLE DETACH PARTITION, is far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE.4.通过分区裁剪,可以定位到具体某一个子表,扫描的数据是分区前数据的一部分,可以有效提升性能,同时也意味着可以有更高的缓冲命中率
分区表的类别
继承式分区
在PostgreSQL v10以前,是通过inherit继承 + check约束 + rule/trigger来创建分区表。
create table ptab02 (
id int not null,
tm timestamptz not null
);
create table ptab02_202001 (
check ( tm >= '2020-01-01'::timestamptz and tm < '2020-02-01'::timestamptz )
) inherits (ptab02);
create table ptab02_202002 (
check ( tm >= '2020-02-01'::timestamptz and tm < '2020-03-01'::timestamptz )
) inherits (ptab02);
create table ptab02_202003 (
check ( tm >= '2020-03-01'::timestamptz and tm < '2020-04-01'::timestamptz )
) inherits (ptab02);
create table ptab02_202004 (
check ( tm >= '2020-04-01'::timestamptz and tm < '2020-05-01'::timestamptz )
) inherits (ptab02);
create table ptab02_202005 (
check ( tm >= '2020-05-01'::timestamptz and tm < '2020-06-01'::timestamptz )
) inherits (ptab02);
创建触发器
create or replace function ptab02_insert_trigger()
returns trigger as $$
begin
if new.tm >= '2020-01-01'::timestamptz and new.tm < '2020-02-01'::timestamptz then
insert into ptab02_202001 values (new.*);
elsif new.tm >= '2020-02-01'::timestamptz and new.tm < '2020-03-01'::timestamptz then
insert into ptab02_202002 values (new.*);
elsif new.tm >= '2020-02-01'::timestamptz and new.tm < '2020-04-01'::timestamptz then
insert into ptab02_202003 values (new.*);
elsif new.tm >= '2020-02-01'::timestamptz and new.tm < '2020-05-01'::timestamptz then
insert into ptab02_202004 values (new.*);
elsif new.tm >= '2020-02-01'::timestamptz and new.tm < '2020-06-01'::timestamptz then
insert into ptab02_202005 values (new.*);
else
raise 'value % out of range ', new.tm;
end if;
return null;
end;
$$
language plpgsql;
create trigger insert_ptab02_trigger
before insert on ptab02
for each row execute function ptab02_insert_trigger();
postgres=# \d+ ptab02
Table "public.ptab02"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+--------------------------+-----------+----------+---------+---------+--------------+-------------
id | integer | | not null | | plain | |
tm | timestamp with time zone | | not null | | plain | |
Child tables: ptab02_202001,
ptab02_202002,
ptab02_202003,
ptab02_202004,
ptab02_202005
Access method: heap
此处使用的基于触发器的方式,另外还有规则也可以实现,不过要注意隐藏的坑:
1.规则的开销比触发器大很多,但是这种开销是每个查询只有一次,而不是每行一次,因此这种方法可能对批量插入的情况有优势。不过,在大部分情况下,触发器方法将提供更好的性能。2.要注意的是,COPY会忽略规则。如果你想使用COPY来插入数据,你需要复制到正确的子表,而不是直接复制到主表。COPY确实可以触发触发器,所以如果你使用触发器方法,你可以正常使用它。3.规则的另一个缺点是,如果规则集合无法覆盖插入日期,则没有简单的方法能够强制产生错误,数据会无声无息地进入到主表中。
另外,针对继承式分区,PostgreSQL不能自动校验所有的CHECK
约束之间是否互斥
引自2018年PostgreSQL中国技术大会 《vacuum庖丁解牛》
分区裁剪
在以前基于约束和继承的版本中,控制分区裁剪的参数是constraint_exclusion:
Controls the query planner iss use of table constraints to optimize queries. The allowed values of constraint_exclusion are on (examine constraints for all tables), off (never examine constraints), and partition (examine constraints only for inheritance child tables and UNION ALL subqueries). partition is the default setting.When this parameter allows it for a particular table, the planner compares query conditions with the table is CHECK constraints, and omits scanning tables for which the conditions contradict the constraints.
当PostgreSQL生成执行计划时是否考虑表上的约束,这个参数有三个选项 "off、on、partition",默认参数为 off,意思不使用表上的 constraint 来生成计划,如果设置成 on,则对所有表生效,生成 PLAN 时会考虑表上的约束。设置成partition,则只对分区表生效,从而避免扫描分区表所有分区。
比如某个表上存在check id > 100的约束,那么查询select id from test < 90时,优化器对比约束条件,知道压根没有小于90的记录,直接跳过对于该表的扫描,返回0行记录。
当优化器使用约束排除时,需要花费更多的时间去对比约束条件和where中的过滤条件,默认是partition,对一张表做查询时,如果有继承表,优化器就会对这些子表进行约束排除分析。
postgres=# set constraint_exclusion to off;
SET
postgres=# explain select * from ptab02 where tm='2020-01-07'::timestamptz;
QUERY PLAN
-------------------------------------------------------------------------------
Append (cost=0.00..177.75 rows=51 width=12)
-> Seq Scan on ptab02 ptab02_1 (cost=0.00..0.00 rows=1 width=12)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
-> Seq Scan on ptab02_202001 ptab02_2 (cost=0.00..35.50 rows=10 width=12)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
-> Seq Scan on ptab02_202002 ptab02_3 (cost=0.00..35.50 rows=10 width=12)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
-> Seq Scan on ptab02_202003 ptab02_4 (cost=0.00..35.50 rows=10 width=12)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
-> Seq Scan on ptab02_202004 ptab02_5 (cost=0.00..35.50 rows=10 width=12)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
-> Seq Scan on ptab02_202005 ptab02_6 (cost=0.00..35.50 rows=10 width=12)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
(13 rows)
postgres=# set constraint_exclusion to on;
SET
postgres=# explain select * from ptab02 where tm='2020-01-07'::timestamptz;
QUERY PLAN
-------------------------------------------------------------------------------
Append (cost=0.00..35.55 rows=11 width=12)
-> Seq Scan on ptab02 ptab02_1 (cost=0.00..0.00 rows=1 width=12)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
-> Seq Scan on ptab02_202001 ptab02_2 (cost=0.00..35.50 rows=10 width=12)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
(5 rows)
postgres=# explain update ptab02 set id = 99 where tm='2020-01-07'::timestamptz;
QUERY PLAN
-------------------------------------------------------------------------------------------
Update on ptab02 (cost=0.00..35.66 rows=0 width=0)
Update on ptab02 ptab02_1
Update on ptab02_202001 ptab02_2
-> Result (cost=0.00..35.66 rows=11 width=14)
-> Append (cost=0.00..35.55 rows=11 width=10)
-> Seq Scan on ptab02 ptab02_1 (cost=0.00..0.00 rows=1 width=10)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
-> Seq Scan on ptab02_202001 ptab02_2 (cost=0.00..35.50 rows=10 width=10)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
(9 rows)
在我们实际生产中,有一个案例,表根据定义是一个继承式分区,但是这些子表居然没有定义约束,完全靠触发器来实现数据插入到子表
CREATE OR REPLACE FUNCTION fnc_hm_agg_step_detail_insert ()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
IF (news.operation_time >= DATE '2021-12-01' AND NEW.operation_time < DATE '2022-01-01') THEN
INSERT INTO hm_agg_step_detail_202112
VALUES (NEW.*);
ELSIF (
于是每一个SQL都无法裁剪,需要扫描所有的分区,导致CPU飙升,性能特别低。
另外,他们的SQL使用的current_date,也无法进行裁剪。类似如下:
postgres=# explain select * from ptab02 where tm=current_date;
QUERY PLAN
-------------------------------------------------------------------------------
Append (cost=0.00..203.25 rows=51 width=12)
-> Seq Scan on ptab02 ptab02_1 (cost=0.00..0.00 rows=1 width=12)
Filter: (tm = CURRENT_DATE)
-> Seq Scan on ptab02_202001 ptab02_2 (cost=0.00..40.60 rows=10 width=12)
Filter: (tm = CURRENT_DATE)
-> Seq Scan on ptab02_202002 ptab02_3 (cost=0.00..40.60 rows=10 width=12)
Filter: (tm = CURRENT_DATE)
-> Seq Scan on ptab02_202003 ptab02_4 (cost=0.00..40.60 rows=10 width=12)
Filter: (tm = CURRENT_DATE)
-> Seq Scan on ptab02_202004 ptab02_5 (cost=0.00..40.60 rows=10 width=12)
Filter: (tm = CURRENT_DATE)
-> Seq Scan on ptab02_202005 ptab02_6 (cost=0.00..40.60 rows=10 width=12)
Filter: (tm = CURRENT_DATE)
(13 rows)
postgres=# explain select * from ptab02 where tm=now();
QUERY PLAN
-------------------------------------------------------------------------------
Append (cost=0.00..203.25 rows=51 width=12)
-> Seq Scan on ptab02 ptab02_1 (cost=0.00..0.00 rows=1 width=12)
Filter: (tm = now())
-> Seq Scan on ptab02_202001 ptab02_2 (cost=0.00..40.60 rows=10 width=12)
Filter: (tm = now())
-> Seq Scan on ptab02_202002 ptab02_3 (cost=0.00..40.60 rows=10 width=12)
Filter: (tm = now())
-> Seq Scan on ptab02_202003 ptab02_4 (cost=0.00..40.60 rows=10 width=12)
Filter: (tm = now())
-> Seq Scan on ptab02_202004 ptab02_5 (cost=0.00..40.60 rows=10 width=12)
Filter: (tm = now())
-> Seq Scan on ptab02_202005 ptab02_6 (cost=0.00..40.60 rows=10 width=12)
Filter: (tm = now())
(13 rows)
postgres=# \df+ now()
List of functions
-[ RECORD 1 ]-------+-------------------------
Schema | pg_catalog
Name | now
Result data type | timestamp with time zone
Argument data types |
Type | func
Volatility | stable
Parallel | safe
Owner | postgres
Security | invoker
Access privileges |
Language | internal
Source code | now
Description | current transaction time
在PostgreSQL中,函数稳定性分为三档:
•immutable:超级稳定,任何时候调用,只要函数的参数不变结果就不变。如果参数为常量或者无参数,在生成执行计划时,直接将这类函数替换为常量。•stable:稳定,在一个事务中调用时,只要函数的参数不变结果就不变。同样的参数值stable函数多次执行返回的结果应该一致。•volatile:指函数可以修改数据库(比如update),函数参数值相同的情况下,可以返回不同的结果,所以volatile函数在执行过程中优化器对它的处理是每一行都需要执行一次volatile函数。
函数的稳定性会影响数据库的一些行为:
1.绑定变量,immutable函数(包含常量参数或不包含参数时)计算一次。stable函数每次bind的时候要重算。2.生成执行计划,stable、immutable函数作为where条件时,可以被用于索引am。(即允许采用索引优化)3.排除分区表不需要访问的分区,stable、immutable函数作为where条件时,可用于过滤不需要访问的子表。4.是否可用于创建索引,只有immutable函数或操作符,可以用于创建表达式索引。5.这里要注意的是volatile、stable、immutable这几种函数,对数据的修改的可见性分两种情况。volatile,调用该函数的SQL对数据的修改可见;stable、immutable调用该函数的SQL对数据的修改不可见。
为此,我们可以包一下now()这个函数
postgres=# create or replace function g_time() returns timestamp with time zone
as $$
begin
return now()::timestamp with time zone;
end;
$$ immutable language plpgsql ;
CREATE FUNCTION
postgres=# explain select * from ptab02 where tm=g_time();
QUERY PLAN
----------------------------------------------------------------------------
Seq Scan on ptab02 (cost=0.00..0.00 rows=1 width=12)
Filter: (tm = '2022-03-11 12:45:25.893058+08'::timestamp with time zone)
(2 rows)
注意可能的类型差异导致的裁剪失败
postgres=# drop function g_time ;
DROP FUNCTION
postgres=# create or replace function g_time() returns timestamp
as $$
begin
return now()::timestamp;
end;
$$ immutable language plpgsql ;
CREATE FUNCTION
postgres=# explain select * from ptab02 where tm=g_time();
QUERY PLAN
----------------------------------------------------------------------------------
Append (cost=0.00..177.75 rows=51 width=12)
-> Seq Scan on ptab02 ptab02_1 (cost=0.00..0.00 rows=1 width=12)
Filter: (tm = '2022-03-11 13:45:02.110896'::timestamp without time zone)
-> Seq Scan on ptab02_202001 ptab02_2 (cost=0.00..35.50 rows=10 width=12)
Filter: (tm = '2022-03-11 13:45:02.110896'::timestamp without time zone)
-> Seq Scan on ptab02_202002 ptab02_3 (cost=0.00..35.50 rows=10 width=12)
Filter: (tm = '2022-03-11 13:45:02.110896'::timestamp without time zone)
-> Seq Scan on ptab02_202003 ptab02_4 (cost=0.00..35.50 rows=10 width=12)
Filter: (tm = '2022-03-11 13:45:02.110896'::timestamp without time zone)
-> Seq Scan on ptab02_202004 ptab02_5 (cost=0.00..35.50 rows=10 width=12)
Filter: (tm = '2022-03-11 13:45:02.110896'::timestamp without time zone)
-> Seq Scan on ptab02_202005 ptab02_6 (cost=0.00..35.50 rows=10 width=12)
Filter: (tm = '2022-03-11 13:45:02.110896'::timestamp without time zone)
(13 rows)
postgres=# create or replace function g_time() returns timestamp with time zone
as $$
begin
return now()::timestamp;
end;
$$ immutable language plpgsql ;
CREATE FUNCTION
postgres=# explain select * from ptab02 where tm=g_time();
QUERY PLAN
----------------------------------------------------------------------------
Seq Scan on ptab02 (cost=0.00..0.00 rows=1 width=12)
Filter: (tm = '2022-03-11 14:09:38.417772+08'::timestamp with time zone)
(2 rows)
同时也要注意和声明式分区的差异:
clock_timestamp是volatile的,now是stable的
postgres=# explain select * from ptab01 where tm=clock_timestamp();
QUERY PLAN
-------------------------------------------------------------------------------------------------
Gather (cost=1000.00..21622.20 rows=7243 width=12)
Workers Planned: 2
-> Parallel Append (cost=0.00..19897.90 rows=3020 width=12)
-> Parallel Seq Scan on ptab01_202001 ptab01_1 (cost=0.00..4054.40 rows=869 width=12)
Filter: (tm = clock_timestamp())
-> Parallel Seq Scan on ptab01_202003 ptab01_3 (cost=0.00..4054.40 rows=869 width=12)
Filter: (tm = clock_timestamp())
-> Parallel Seq Scan on ptab01_202005 ptab01_5 (cost=0.00..4054.40 rows=869 width=12)
Filter: (tm = clock_timestamp())
-> Parallel Seq Scan on ptab01_202004 ptab01_4 (cost=0.00..3925.60 rows=841 width=12)
Filter: (tm = clock_timestamp())
-> Parallel Seq Scan on ptab01_202002 ptab01_2 (cost=0.00..3794.00 rows=813 width=12)
Filter: (tm = clock_timestamp())
(13 rows)
postgres=# explain select * from ptab01 where tm=now();
QUERY PLAN
-----------------------------------------------------------------
Gather (cost=1000.00..21622.20 rows=7243 width=12)
Workers Planned: 2
-> Parallel Append (cost=0.00..19897.90 rows=3020 width=12)
Subplans Removed: 5
(4 rows)
postgres=# explain select * from ptab01 where tm=g_time();
QUERY PLAN
-----------------------------------------------
Append (cost=0.00..23517.03 rows=5 width=12)
Subplans Removed: 5
(2 rows)
声明式分区
在v10之后,PostgreSQL正式引入了声明式分区,Declarative Partitioning,意味着不再需要配置繁琐的触发器和继承了。
CREATE TABLE ptab01 (
id int not null,
tm timestamptz not null
) PARTITION BY RANGE (tm);
create table ptab01_202001 partition of ptab01 for values from ('2020-01-01') to ('2020-02-01');
create table ptab01_202002 partition of ptab01 for values from ('2020-02-01') to ('2020-03-01');
create table ptab01_202003 partition of ptab01 for values from ('2020-03-01') to ('2020-04-01');
create table ptab01_202004 partition of ptab01 for values from ('2020-04-01') to ('2020-05-01');
create table ptab01_202005 partition of ptab01 for values from ('2020-05-01') to ('2020-06-01');
insert into ptab01 select extract(epoch from seq), seq from generate_series('2020-01-01'::timestamptz, '2020-05-31 23:59:59'::timestamptz, interval '10 seconds') as seq;
postgres=# \d+ ptab01
Partitioned table "public.ptab01"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+--------------------------+-----------+----------+---------+---------+--------------+-------------
id | integer | | not null | | plain | |
tm | timestamp with time zone | | not null | | plain | |
Partition key: RANGE (tm)
Partitions: ptab01_202001 FOR VALUES FROM ('2020-01-01 00:00:00+08') TO ('2020-02-01 00:00:00+08'),
ptab01_202002 FOR VALUES FROM ('2020-02-01 00:00:00+08') TO ('2020-03-01 00:00:00+08'),
ptab01_202003 FOR VALUES FROM ('2020-03-01 00:00:00+08') TO ('2020-04-01 00:00:00+08'),
ptab01_202004 FOR VALUES FROM ('2020-04-01 00:00:00+08') TO ('2020-05-01 00:00:00+08'),
ptab01_202005 FOR VALUES FROM ('2020-05-01 00:00:00+08') TO ('2020-06-01 00:00:00+08')
分区裁剪
声明式分区的裁剪参数是enable_partition_pruning
Enables or disables the query planner's ability to eliminate a partitioned table's partitions from query plans. This also controls the planner's ability to generate query plans which allow the query executor to remove (ignore) partitions during query execution. The default is
on
. See Section 5.11.4 for details.
postgres=# set enable_partition_pruning to off;
SET
postgres=# explain select * from ptab01 where tm='2020-01-07'::timestamptz;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Gather (cost=1000.00..17758.00 rows=5 width=12)
Workers Planned: 2
-> Parallel Append (cost=0.00..16757.50 rows=5 width=12)
-> Parallel Seq Scan on ptab01_202001 ptab01_1 (cost=0.00..3417.41 rows=1 width=12)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
-> Parallel Seq Scan on ptab01_202003 ptab01_3 (cost=0.00..3417.41 rows=1 width=12)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
-> Parallel Seq Scan on ptab01_202005 ptab01_5 (cost=0.00..3417.41 rows=1 width=12)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
-> Parallel Seq Scan on ptab01_202004 ptab01_4 (cost=0.00..3307.88 rows=1 width=12)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
-> Parallel Seq Scan on ptab01_202002 ptab01_2 (cost=0.00..3197.35 rows=1 width=12)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
(13 rows)
postgres=# set enable_partition_pruning to on;
SET
postgres=# explain select * from ptab01 where tm='2020-01-07'::timestamptz;
QUERY PLAN
---------------------------------------------------------------------------------------
Gather (cost=1000.00..4417.51 rows=1 width=12)
Workers Planned: 1
-> Parallel Seq Scan on ptab01_202001 ptab01 (cost=0.00..3417.41 rows=1 width=12)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
(4 rows)
postgres=# explain update ptab01 set id = 99 where tm='2020-01-07'::timestamptz;
QUERY PLAN
-----------------------------------------------------------------------------------
Update on ptab01 (cost=0.00..5140.40 rows=0 width=0)
Update on ptab01_202001 ptab01_1
-> Seq Scan on ptab01_202001 ptab01_1 (cost=0.00..5140.40 rows=1477 width=14)
Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
(4 rows)
postgres=# \sf+ g_time()
CREATE OR REPLACE FUNCTION public.g_time()
RETURNS timestamp with time zone
LANGUAGE plpgsql
IMMUTABLE
1 AS $function$
2 begin
3 return now()::timestamp with time zone;
4 end;
5 $function$
postgres=# explain update ptab01 set id = 99 where tm=now()::timestamptz;
QUERY PLAN
--------------------------------------------------------
Update on ptab01 (cost=0.00..26800.22 rows=0 width=0)
Update on ptab01_202001 ptab01
Update on ptab01_202002 ptab01
Update on ptab01_202003 ptab01
Update on ptab01_202004 ptab01
Update on ptab01_202005 ptab01
-> Append (cost=0.00..26800.22 rows=5 width=14)
Subplans Removed: 5
(8 rows)
postgres=# explain update ptab01 set id = 99 where tm=g_time()::timestamptz;
QUERY PLAN
----------------------------------------------------
Update on ptab01 (cost=0.00..0.00 rows=0 width=0)
-> Result (cost=0.00..0.00 rows=0 width=10)
One-Time Filter: false
(3 rows)
版本差异
注意不同的版本,裁剪的能力也有区别
postgres=# explain update measurement set peaktemp=2 where logdate = (now() - interval '1 week');
QUERY PLAN
------------------------------------------------------------------------------------------
Update on measurement (cost=0.00..127.12 rows=27 width=22)
Update on measurement_y2006m02 measurement_1
Update on measurement_y2006m03 measurement_2
Update on measurement_y2007m11 measurement_3
-> Seq Scan on measurement_y2006m02 measurement_1 (cost=0.00..42.38 rows=9 width=22)
Filter: (logdate = (now() - '7 days'::interval))
-> Seq Scan on measurement_y2006m03 measurement_2 (cost=0.00..42.38 rows=9 width=22)
Filter: (logdate = (now() - '7 days'::interval))
-> Seq Scan on measurement_y2007m11 measurement_3 (cost=0.00..42.38 rows=9 width=22)
Filter: (logdate = (now() - '7 days'::interval))
(10 rows)
postgres=# explain delete from measurement where logdate = (now() - interval '1 week');
QUERY PLAN
-----------------------------------------------------------------------------------------
Delete on measurement (cost=0.00..127.12 rows=27 width=6)
Delete on measurement_y2006m02 measurement_1
Delete on measurement_y2006m03 measurement_2
Delete on measurement_y2007m11 measurement_3
-> Seq Scan on measurement_y2006m02 measurement_1 (cost=0.00..42.38 rows=9 width=6)
Filter: (logdate = (now() - '7 days'::interval))
-> Seq Scan on measurement_y2006m03 measurement_2 (cost=0.00..42.38 rows=9 width=6)
Filter: (logdate = (now() - '7 days'::interval))
-> Seq Scan on measurement_y2007m11 measurement_3 (cost=0.00..42.38 rows=9 width=6)
Filter: (logdate = (now() - '7 days'::interval))
(10 rows)
postgres=# explain delete from measurement where logdate = (current_date - interval '1 week');
QUERY PLAN
-----------------------------------------------------------------------------------------
Delete on measurement (cost=0.00..127.12 rows=27 width=6)
Delete on measurement_y2006m02 measurement_1
Delete on measurement_y2006m03 measurement_2
Delete on measurement_y2007m11 measurement_3
-> Seq Scan on measurement_y2006m02 measurement_1 (cost=0.00..42.38 rows=9 width=6)
Filter: (logdate = (CURRENT_DATE - '7 days'::interval))
-> Seq Scan on measurement_y2006m03 measurement_2 (cost=0.00..42.38 rows=9 width=6)
Filter: (logdate = (CURRENT_DATE - '7 days'::interval))
-> Seq Scan on measurement_y2007m11 measurement_3 (cost=0.00..42.38 rows=9 width=6)
Filter: (logdate = (CURRENT_DATE - '7 days'::interval))
(10 rows)
postgres=# create or replace function g_time() returns date
as $$
begin
return now()::date;
end;
$$ immutable language plpgsql ;
CREATE FUNCTION
postgres=# explain update measurement set peaktemp=2 where logdate = (g_time() - interval '1 week');
QUERY PLAN
---------------------------------------------------------
Update on measurement (cost=0.00..0.00 rows=0 width=0)
-> Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(3 rows)
14的版本
postgres=# explain update measurement set peaktemp=2 where logdate = (now() - interval '1 week');
QUERY PLAN
-----------------------------------------------------------
Update on measurement (cost=0.00..127.26 rows=0 width=0)
Update on measurement_y2006m02 measurement
Update on measurement_y2006m03 measurement
Update on measurement_y2007m11 measurement
-> Append (cost=0.00..127.26 rows=27 width=14)
Subplans Removed: 3
(6 rows)
postgres=# explain delete from measurement where logdate = (now() - interval '1 week');
QUERY PLAN
-----------------------------------------------------------
Delete on measurement (cost=0.00..127.26 rows=0 width=0)
Delete on measurement_y2006m02 measurement
Delete on measurement_y2006m03 measurement
Delete on measurement_y2007m11 measurement
-> Append (cost=0.00..127.26 rows=27 width=10)
Subplans Removed: 3
(6 rows)
postgres=# explain delete from measurement where logdate = (current_date - interval '1 week');
QUERY PLAN
-----------------------------------------------------------
Delete on measurement (cost=0.00..127.26 rows=0 width=0)
Delete on measurement_y2006m02 measurement
Delete on measurement_y2006m03 measurement
Delete on measurement_y2007m11 measurement
-> Append (cost=0.00..127.26 rows=27 width=10)
Subplans Removed: 3
(6 rows)
哈希分区
PostgreSQL做了hash分区后,我们并不能事先知道数据存放在哪个区,譬如执行一个已做hash分区的SQL,只能从执行计划里找到第4个分区;
explain(analyze,buffers,verbose)
select count(*) from tmp_hash_visit where top_id=10004867872
-> Bitmap Heap Scan on public.tmp_hash_visit_4(cost=107.79..6041.10 rows=5596 width=0) (actual time=2.660..9.835 rows=5796 loops=1)当然正常情况,我们并不需要干预,但有些时候,我希望知道数据是存在哪个分区,这样我在写查询时就可以直接写
select count(*) from tmp_hash_visit_4 where top_Id=10004867872
对需要性能极致化的要求来说,直接操作分区子表无疑是最佳的方式,list 或 range 分区表很容易就可以获取到子表表名,hash 分区却不太容易。PostgreSQL 社区版本目前没有提供这种函数直接取 hash 值,但所有有用的信息都可以在源代码中找到,可以通过自定义函数,编译成共享库的方式来定制实现。
下面提供了 Qunar 在 PostgresQL 12 中定制函数取 hash 值的例子。
代码中提供了五个函数, 函数 hash_partition_value 参数为分区表的 OID , 分区数,以及分区的值,支持多字段,其它函数参数为分区的值和分区数, 不依赖与分区表 OID 。
纯干货!一文学会 PostgreSQL 实现表分区的方法
https://github.com/digoal/blog/blob/master/202109/20210908_01.md
裁剪的区别
PostgreSQL为了避免在不必要地分区中搜索数据,只扫描符合条件的某个分区,而不是扫描所有分区。使用约束排除(constraint exclusion ),控制select,update,delete,用于改进表分区的性能。如果没有约束排除,查询会扫描每一个分区。打开了约束排除之后,规划器将检查每个分区的约束然后再试图证明该分区不需要被扫描,因为它不包含任何符合WHERE子句条件的数据行。如果规划器可以证明这个,它就把该分区从查询规划里排除出去。
在v11,引入了新的特性来过滤分区:plan-time and run-time partition prune,并添加了新的GUC参数enable_partition_pruning。
注意,在历史文档中,比如11.2的文档中,有提到如下:
Note
Currently, pruning of partitions during the planning of an
UPDATE
orDELETE
command is implemented using the constraint exclusion method (however, it is controlled by theenable_partition_pruning
rather thanconstraint_exclusion
) - see the following section for details and caveats that apply.Also, execution-time partition pruning currently only occurs for the
Append
node type, notMergeAppend
.Both of these behaviors are likely to be changed in a future release of PostgreSQL .
当前,在
UPDATE
或者INSERT
执行期间对于分区的裁剪是通过约束排除实现的 (然而,它是由enable_partition_pruning参数控制的,而不是constraint_exclusion)
在较新的文档中,已经没有了。
在PostgreSQL 10中,分区裁剪是通过constraint_exclusion参数来控制,该机制是一种线性算法,需要查看每个分区的元数据,以检查分区是否与查询WHERE子句匹配。在PostgreSQL 11中 ,partition prune不需要逐个检查分区,对于range和list分区,通过二分搜索找到匹配的分区,对于hash分区,通过hash函数找到匹配的区,提高性能。
一句话就是,10 是检查每个分区,11 是直接计算所需的分区。https://www.2ndquadrant.com/en/blog/partition-elimination-postgresql-11/
In PostgreSQL 11 this elimination of unneeded partitions (aka partition pruning) is no longer an exhaustive linear search. A binary search quickly identifies matching LIST and RANGE partitions. A hashing function finds the matching partitions for HASH partitioned tables, which are new in PG11.This also makes improvements so that it’s able to prune partitions in a few more cases than was previously possible.
所以也就意味着,子表数量越多,生成执行计划的速度也就越慢,子表数量上千之后就会发现planning time特别长了,这种场景的话,便可以使用绑定变量了,把执行计划缓存起来,以备后用。但是又会经常遇到另一个问题,在此就表过不提了。
org.postgresql.util.PSQLException: ERROR: cached plan must not change result type
另外
1.constraint_exclusion仅在查询计划期间应用,与enable_partition_pruning不同,后者也可以在查询执行的期间内应用。2.constraint_exclusion仅在查询的where子句包含常量(或外部提供的参数)时有效。例如,CURRENT_TIMESTAMP这一类函数便无法优化,因为是非稳定的,规划器不知道函数的值在运行时会落入哪个子表。3.constraint_exclusion会检查所有子表上的所有约束,子表越多,查询规划时间越长。因此,传统的基于继承的分区可以很好地处理上百个子表,不要尝试使用上千个子表。
当然也有好处:
1.constraint_exclusion可以作用于所有的表上,在声明式分区上也可以建约束,使用该特性2.enable_partition_pruning则只作用于声明式分区
而enable_partition_pruning的优势在于:
1.在查询计划的初始化期间,对于执行的初始化阶段就已知值的参数,可以在这里执行分区裁剪。这个阶段中被剪枝掉的分区将不会显示在查询的EXPLAIN
或EXPLAIN ANALYZE
结果中。通过观察EXPLAIN
输出的"Subplans Removed"属性,可以确定被剪枝掉的分区数。2.在查询计划的实际执行期间,仍然可能使用运行时才能确定的值完成分区裁剪的操作。这包括来自子查询的值以及来自执行时参数的值 (例如来自于参数化嵌套循环连接的参数) 。由于在查询执行期间这些参数的值可能会改变多次,所以只要分区剪枝使用到的执行参数发生改变,就会执行一次分区剪枝。要判断分区是否在这个阶段被剪枝,需要仔细地观察EXPLAIN ANALYZE
输出中的loops
属性。对应于不同分区的子计划可以具有不同的值,这取决于在执行期间每个分区被修剪的次数。如果每次都被剪枝,有些分区可能会显示为(never executed)
。见如下示例:
explain analyze select * from rtable where c1=(select 256);
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Append (cost=0.01..74.61 rows=19 width=42) (actual time=0.027..0.027 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1)
-> Seq Scan on rtable100 (cost=0.00..24.50 rows=6 width=42) (never executed)
Filter: (c1 = $0)
-> Seq Scan on rtable200 (cost=0.00..24.50 rows=6 width=42) (never executed)
Filter: (c1 = $0)
-> Seq Scan on rtable300 (cost=0.00..1.00 rows=1 width=42) (actual time=0.012..0.012 rows=0 loops=1)
Filter: (c1 = $0)
-> Seq Scan on rtable400 (cost=0.00..24.50 rows=6 width=42) (never executed)
Filter: (c1 = $0)
Planning Time: 0.301 ms
Execution Time: 0.103 ms
(13 rows)
以往,由规划器(planner)筛选符合条件的分区,对于where子句带有参数或者子查询的情况,并不能过滤分区,只有在执行时才能知道子查询的值,新特性改善这种情况,在执行时,还会进行分区过滤。比如如下这一类SQL:
SELECT * FROM foo WHERE a > $1;
SELECT ... FROM parttab WHERE partkey =(SELECT ... FROM othertable WHERE ...);
以下面为例:
SELECT * FROM foo, bar WHERE foo.a = bar.a;
两张表在a列上都有unique索引,每张表都有10个子表,每张表100万数据,测试结果如下:
v10 | 103 seconds |
v11 | 20 seconds |
另外一个是关于绑定变量的测试
-- PostgreSQL 10
PREPARE ps1 (INT) AS SELECT * FROM rtable WHERE c1 = $1;
explain analyze execute ps1(256);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Append (cost=0.00..98.00 rows=24 width=42) (actual time=0.051..0.051 rows=0 loops=1)
-> Seq Scan on rtable100 (cost=0.00..24.50 rows=6 width=42) (actual time=0.011..0.011 rows=0 loops=1)
Filter: (c1 = 256)
-> Seq Scan on rtable200 (cost=0.00..24.50 rows=6 width=42) (actual time=0.006..0.006 rows=0 loops=1)
Filter: (c1 = 256)
-> Seq Scan on rtable300 (cost=0.00..24.50 rows=6 width=42) (actual time=0.006..0.006 rows=0 loops=1)
Filter: (c1 = 256)
-> Seq Scan on rtable400 (cost=0.00..24.50 rows=6 width=42) (actual time=0.005..0.005 rows=0 loops=1)
Filter: (c1 = 256)
Planning time: 0.373 ms
Execution time: 0.215 ms
(11 rows)
-- PostgreSQL 11
PREPARE ps1 (INT) AS SELECT * FROM rtable WHERE c1 = $1;
explain analyze execute ps1(256);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Append (cost=0.00..24.53 rows=6 width=42) (actual time=0.017..0.019 rows=1 loops=1)
-> Seq Scan on rtable300 (cost=0.00..24.50 rows=6 width=42) (actual time=0.016..0.017 rows=1 loops=1)
Filter: (c1 = 256)
Planning Time: 0.382 ms
Execution Time: 0.049 ms
(5 rows)
分区模式的区别
1.对声明式分区来说,分区必须具有和分区表相同的列,而在表继承中,子表可以有父表中没有的额外列。2.表继承允许多继承。3.声明式分区仅支持范围、列表以及哈希分区,而表继承允许数据按照用户的选择来划分4.在使用声明式分区时,一些操作比使用表继承时要求更长的持锁时间。例如,向分区表中增加分区或者从分区表移除分区要求在父表上获取ACCESS EXCLUSIVE
锁,而在常规继承的情况下一个SHARE UPDATE EXCLUSIVE
锁就足够了。5.对于继承表,analyze父表不会自动收集子表,也就意味着只有父表收集了,而声明式分区时可以的postgres=# analyze verbose ptab01;
6.对于v11以后,update支持将一行数据从一个子表更新到另一个子表中,而原生的继承表则不行,当然你也可以使用触发器,但是管理运维成本就太高了
INFO: analyzing "public.ptab01" inheritance tree
INFO: "ptab01_202001": scanned 1448 of 1448 pages, containing 267840 live rows and 0 dead rows; 6117 rows in sample, 267840 estimated total rows
INFO: "ptab01_202002": scanned 1355 of 1355 pages, containing 250560 live rows and 0 dead rows; 5725 rows in sample, 250560 estimated total rows
INFO: "ptab01_202003": scanned 1448 of 1448 pages, containing 267840 live rows and 0 dead rows; 6117 rows in sample, 267840 estimated total rows
INFO: "ptab01_202004": scanned 1402 of 1402 pages, containing 259200 live rows and 0 dead rows; 5923 rows in sample, 259200 estimated total rows
INFO: "ptab01_202005": scanned 1448 of 1448 pages, containing 267840 live rows and 0 dead rows; 6117 rows in sample, 267840 estimated total rows
INFO: analyzing "public.ptab01_202001"
INFO: "ptab01_202001": scanned 1448 of 1448 pages, containing 267840 live rows and 0 dead rows; 30000 rows in sample, 267840 estimated total rows
INFO: analyzing "public.ptab01_202002"
INFO: "ptab01_202002": scanned 1355 of 1355 pages, containing 250560 live rows and 0 dead rows; 30000 rows in sample, 250560 estimated total rows
INFO: analyzing "public.ptab01_202003"
INFO: "ptab01_202003": scanned 1448 of 1448 pages, containing 267840 live rows and 0 dead rows; 30000 rows in sample, 267840 estimated total rows
INFO: analyzing "public.ptab01_202004"
INFO: "ptab01_202004": scanned 1402 of 1402 pages, containing 259200 live rows and 0 dead rows; 30000 rows in sample, 259200 estimated total rows
INFO: analyzing "public.ptab01_202005"
INFO: "ptab01_202005": scanned 1448 of 1448 pages, containing 267840 live rows and 0 dead rows; 30000 rows in sample, 267840 estimated total rows
ANALYZE
postgres=# analyze verbose ptab02;
INFO: analyzing "public.ptab02"
INFO: "ptab02": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.ptab02" inheritance tree
ANALYZE
分区表的演进
在此只做简单的整理,详细可以参照之前的文档 《PostgreSQL分区表演进》
pg_pathman
社区核心成员之一oleg所在的公司postgrespro,开发了一款分区表功能的插件,不需要动用catalog,可以很方便的增加分区表的功能。pg_pathman 是一个 PostgreSQL 高性能表分区插件。可通过内建函数挂载、摘除和分区。直到v12,原生分区表的性能才得到大幅度提升,可以与pg_pathman持平,PG12内存使用量降低、plan更快、TPS更快,对于以前的版本建议使用pg_pathman。
pg_pathman与传统的继承分区表做法有一个不同的地方,分区的定义存放在一张元数据表中,表的信息会cache在内存中,同时使用HOOK来实现RELATION的替换,所以效率非常高。要注意可能的问题:pg_pathman进程初始化时会加载所有pathman分区表配置信息到该进程私有内存空间,分区数量越多,进程的私有内存越大。不过随着PG分区表功能的不断增强,pg_pathman可能不再维护了。
pg_pathman支持自动分区管理(通过函数接口创建分区,自动将主表数据迁移到分区表),或手工分区管理(通过函数实现,将已有的表绑定到分区表,或者从分区表剥离)、支持自动新增分区(range分区),更多使用示例可以参考德哥的帖子:《PostgreSQL 9.5+ 高效分区表实现 - pg_pathman》
当然,pg_pathman也有一些常见的bug,比如经常碰到的 https://github.com/postgrespro/pg_pathman/issues/100,**这个其中一个案例是和auto_explain插件冲突**
ERROR:unrecognized node type: 211 #100:
以及几个实际案例:
1.pg_pathman的range分区,已知在筛选条件中如果对于分区字段右值套用了函数表达式,或者类型转换函数to_date(),to_timestamp()等,那么不会筛选出对应的分区表,会扫描所有的分区表;但是支持::date或者::timestamp这种类型转换,就可以走分区裁剪,可以参考:《PostgreSQL运维案例--记使用pg_pathman的range分区踩到的坑》——from杨向博2.约束名过长63字符,导致查询报错,参考:《Pathman的一个小案例》 ——from杨向博3.之前的记录《不同用户的执行计划居然会不一样?》 ——原生分区表4.段错误,heal segfault introduced in 11.7 and 12.2,使用的版本是11.7到12.2之前的版本,会出现段错误,append_rel_array动态申请空间的操作存在野指针,pg_pathman需要升级至1.5.11,详见 https://github.com/postgrespro/pg_pathman/releases/tag/1.5.115.pg_pathman支持自动创建分区(仅对范围分区有效),假如应用不小心插入了一个很遥远的值,比如时间戳,那么会创建出很多个分区6.could not find pathkey item to sort,和绑定变量有关,用prepare执行,第6次稳定出现,考虑使用原生分区或者调整plan_cache_mode为force_custom_plan,永远采用硬解析。
pg_pathman与原生分区的差异
https://github.com/digoal/blog/blob/master/201802/20180206_01.md
https://github.com/digoal/blog/blob/master/201710/20171015_01.md
pg_rewrite
在线将普通表转换为分区表,https://github.com/cybertec-postgresql/pg_rewrite,可以参考我之前的文章《高效分区转化pg_rewrite》
1.解决了从非分区表变更为分区表的长时间锁问题2.需要使用logical replication,从非分区表增量将数据复制到分区表3.只需要短暂的排他锁, 在同步完数据后用于切换表名. 有个参数控制切换表名阶段的锁超时, 例如设置为100毫秒, 重试3次还没有拿到这个锁的话, 报错. 代码在本文末尾.4.非分区表一定要有PK5.注意: 分区表建议确认约束和非分区表保持一致, 例如check, not null, FK, default value 等约束6.serial字段也记得要设置妥当
https://www.cybertec-postgresql.com/en/pg_rewrite-postgresql-table-partitioning/
pg_partman
此处表过不提,感兴趣的读者自行搜索。
enable_partitionwise_join
Enables or disables the query planner's use of partitionwise join, which allows a join between partitioned tables to be performed by joining the matching partitions. Partitionwise join currently applies only when the join conditions include all the partition keys, which must be of the same data type and have exactly matching sets of child partitions. Because partitionwise join planning can use significantly more CPU time and memory during planning, the default is
off
.启用或禁用查询计划器对分区连接的使用,它允许通过连接匹配的分区来执行分区表之间的连接。分区连接目前只适用于连接条件包括所有分区键的情况,这些分区键必须是相同的数据类型,并且有完全匹配的子分区集。因为分区连接规划在规划过程中可能会使用明显更多的CPU时间和内存,所以默认是关闭。
可以看到,在查询规划阶段会耗费更多的CPU和内存,所以默认为off。
There is nothing dangerous about this parameter.
It is just that only very specific queries benefit from this: the tables need to be partitioned identically, and the partitioning key must be part of the join condition.
Enabling the parameter slightly increases query planning time for many queries, but only few benefit. That is the reason this is disabled by default. The problem is exacerbated because PostgreSQL usually doesn't cache query plans (you'd have to use prepared statements or PL/pgSQL functions).
这个参数没有什么危险。只是只有非常特定的查询才能从中受益:表需要进行相同的分区,并且分区键必须是连接条件的一部分。
启用该参数会略微增加许多查询的查询计划时间,但好处很少。这就是默认情况下禁用此功能的原因。由于 PostgreSQL 通常不缓存查询计划(您必须使用预备语句或 PL/pgSQL 函数),因此问题会变得更加严重。
如果两个表是兼容的分区,并且连接在分区列上,那么我们可以分别连接每对分区。即一个大连接被分解成一堆小连接,从而提高了效率。兼容的分区指:相同的分区模式,子分区数目、表定义、分区键字段类型。如果不同则不会使用分区join。
create table t1(id int) partition by range(id);
create table t1_p1 partition of t1 for values from (0) to (100);
create table t1_p2 partition of t1 for values from (150) to (200);
create table t2(id int) partition by range(id);
create table t2_p1 partition of t2 for values from (0) to (50);
create table t2_p2 partition of t2 for values from (100) to (175);
然后我们分别在PostgreSQL 12版本和PostgreSQL 13执行下面的SQL:
explain select * from t1, t2 where t1.id=t2.id;
对比如下:
enable_partitionwise_aggregate
Enables or disables the query planner's use of partitionwise grouping or aggregation, which allows grouping or aggregation on a partitioned tables performed separately for each partition. If the GROUP BY clause does not include the partition keys, only partial aggregation can be performed on a per-partition basis, and finalization must be performed later. Because partitionwise grouping or aggregation can use significantly more CPU time and memory during planning, the default is off.
允许或者禁止查询规划器使用面向分区的分组或聚集,这使得在分区表上的分组或聚集可以在每个分区上分别执行。如果GROUP BY
子句不包括分区键,只有部分聚集能够以基于每个分区的方式执行,并且finalization必须最后执行。由于面向分区的分组或聚集在规划期间会使用可观的CPU时间和内存,所以默认值为off
。
可以看到这个值,默认是off,和enable_partitionwise_join类似。
样例参照 https://www.cnblogs.com/abclife/p/11648006.html
postgres=# create table pagg_t (a int, b int, c text, d int) partition by list(c);
CREATE TABLE
postgres=# create table pagg_t_p1 partition of pagg_t for values in ('000', '001', '002', '003');
CREATE TABLE
postgres=# create table pagg_t_p2 partition of pagg_t for values in ('004', '005', '006', '007');
CREATE TABLE
postgres=# create table pagg_t_p3 partition of pagg_t for values in ('008', '009', '010', '011');
CREATE TABLE
postgres=# insert into pagg_t select i % 20, i % 30, to_char(i % 12, 'fm000'), i % 30 from generate_series(0, 2999) i;
INSERT 0 3000
postgres=# analyze pagg_t;
ANALYZE
postgres=# explain (costs off) select c, sum(a), avg(b), count(*), min(a), max(b) from pagg_t group by c having avg(d) < 15 order by 1, 2, 3;
QUERY PLAN
---------------------------------------------------------
Sort
Sort Key: pagg_t.c, (sum(pagg_t.a)), (avg(pagg_t.b))
-> Append
-> HashAggregate
Group Key: pagg_t.c
Filter: (avg(pagg_t.d) < '15'::numeric)
-> Seq Scan on pagg_t_p1 pagg_t
-> HashAggregate
Group Key: pagg_t_1.c
Filter: (avg(pagg_t_1.d) < '15'::numeric)
-> Seq Scan on pagg_t_p2 pagg_t_1
-> HashAggregate
Group Key: pagg_t_2.c
Filter: (avg(pagg_t_2.d) < '15'::numeric)
-> Seq Scan on pagg_t_p3 pagg_t_2
(15 rows)
开启了enable_partitionwise_aggregate之后,先对表进行Hashagg,然后Append汇总
postgres=# set enable_partitionwise_aggregate to on;
SET
postgres=# explain (costs off) select c, sum(a), avg(b), count(*), min(a), max(b) from pagg_t group by c having avg(d) < 15 order by 1, 2, 3;
QUERY PLAN
---------------------------------------------------------
Sort
Sort Key: pagg_t.c, (sum(pagg_t.a)), (avg(pagg_t.b))
-> Append
-> HashAggregate
Group Key: pagg_t.c
Filter: (avg(pagg_t.d) < '15'::numeric)
-> Seq Scan on pagg_t_p1 pagg_t
-> HashAggregate
Group Key: pagg_t_1.c
Filter: (avg(pagg_t_1.d) < '15'::numeric)
-> Seq Scan on pagg_t_p2 pagg_t_1
-> HashAggregate
Group Key: pagg_t_2.c
Filter: (avg(pagg_t_2.d) < '15'::numeric)
-> Seq Scan on pagg_t_p3 pagg_t_2
(15 rows)
如果GROUP BY子句中没有包含分区字段,只会基于分区执行部分聚合操作,然后再对结果进行一次最终的聚合。
postgres=# explain (costs off) select a, sum(b), avg(b), count(*), min(a), max(b) from pagg_t group by a having avg(d) < 15 order by 1, 2, 3;
QUERY PLAN
--------------------------------------------------------
Sort
Sort Key: pagg_t.a, (sum(pagg_t.b)), (avg(pagg_t.b))
-> Finalize HashAggregate
Group Key: pagg_t.a
Filter: (avg(pagg_t.d) < '15'::numeric)
-> Append
-> Partial HashAggregate
Group Key: pagg_t.a
-> Seq Scan on pagg_t_p1 pagg_t
-> Partial HashAggregate
Group Key: pagg_t_1.a
-> Seq Scan on pagg_t_p2 pagg_t_1
-> Partial HashAggregate
Group Key: pagg_t_2.a
-> Seq Scan on pagg_t_p3 pagg_t_2
(15 rows)
限制
参照官网。
声明式分区
•分区表的唯一约束(也就是主键)必须包括所有的分区键列。存在这个限制是因为构成约束的各个索引只能在它们自己的分区中直接执行唯一性;因此,分区结构本身必须保证在不同的分区中不存在重复。•没有办法创建一个跨越整个分区表的排除性约束。只可能在每个叶子分区上单独设置这样的约束。同样,这个限制源于不能执行跨分区限制。•INSERT上的BEFORE ROW触发器不能改变哪个分区是新行的最终目的地。•在同一个分区树中混合临时和永久关系是不允许的。因此,如果分区表是永久的,它的分区也必须是永久的,同样,如果分区表是临时的,也必须是临时的。当使用临时关系时,分区树的所有成员必须来自同一个会话。•分区不能有父表中不存在的列。在使用CREATE TABLE创建分区时,不可能指定列,也不可能在事后使用ALTER TABLE向分区添加列。表可以通过ALTER TABLE ...添加为一个分区。只有当表的列与父表完全匹配时,才可以用ALTER TABLE ...添加为分区。•分区表的CHECK和NOT NULL约束总是被其所有分区继承。不允许在分区表上创建标有NO INHERIT的CHECK约束。如果在父表中存在相同的约束,那么你不能在分区的列上删除一个NOT NULL约束。•只要没有分区,就支持使用ONLY来添加或删除分区表的约束。一旦存在分区,使用ONLY将导致错误。相反,对分区本身的约束可以被添加和(如果它们在父表中不存在)放弃。•由于分区表本身没有任何数据,试图在一个分区表上使用TRUNCATE ONLY将总是返回一个错误。
继承式分区
•没有自动的方法来验证所有的CHECK约束是互斥的。创建生成子表并创建和/或修改相关对象的代码比手工编写每个对象更安全。•索引和外键约束适用于单个表,而不是其继承的子表,因此它们有一些需要注意的地方。•这里显示的方案假设一行的关键列的值从未改变,或者至少没有改变到需要移到另一个分区的程度。由于CHECK约束,试图这样做的UPDATE将会失败。如果你需要处理这种情况,你可以在子表上设置合适的更新触发器,但是这使得结构的管理更加复杂。•如果你使用手动VACUUM或ANALYZE命令,不要忘记你需要在每个子表上单独运行它们。•带有ON CONFLICT子句的INSERT语句不太可能像预期的那样工作,因为ON CONFLICT动作只在指定的目标关系上出现唯一违反的情况下采取,而不是其子关系。•除非应用程序明确知道分区方案,否则将需要触发器或规则来将行发送到所需的子表。触发器的编写可能很复杂,而且会比声明式分区内部执行的元组路由慢得多。
何时开始分区
没有强制规则来规定多大的表必须分区,对于某些表,30GB可能开始需要分区,对于较低负载的表,可能是1TB。建议DBA根据各厂的业务特性和硬件性能考虑多大的表需要分区,建议SSD硬盘在单表接近40GB时开始考虑分区。
比较傻瓜式的建议(SSD,多核):
1.不频繁更新、删除的表:记录数20亿,表占用空间200 GB。就可以考虑分表了。2.频繁更新、删除、插入的表:记录数2亿,表占用空间20 GB。就可以考虑分表了。
另外,无谓的分区或者糟糕的分区,反而会带来性能损失!大量的metacache,会耗费巨大的内存,查询规划的时间也会增加
实战样例
目前PostgreSQL截至v14还是不支持Oracle的Interval分区,间隔分区是Oracle 11.1引入的新功能,通过该功能,可以在输入相应分区的数据时自动创建相应的分区。在没有间隔分区技术之前,DBA通常会创建一个maxvalue分区以避免ORA-14400:插入的分区键值不能映射到任何分区("inserted partition key does not map to any partition") 错误。PostgreSQL 11 版本分区表增加了 Default 分区用来存储不满足分区定义范围的数据。11以前的版本会提示如下错误
ERROR: no partition of relation "tbl_log" found for row
DETAIL: Partition key of the failing row contains (create_time) = (2018-03-01 00:00:00).
定义了Default分区的分区表如何添加分区
CREATE TABLE ptab01 (
id int not null,
tm timestamptz not null
) PARTITION BY RANGE (tm);
create table ptab01_202001 partition of ptab01 for values from ('2020-01-01') to ('2020-02-01');
create table ptab01_202002 partition of ptab01 for values from ('2020-02-01') to ('2020-03-01');
create table ptab01_202003 partition of ptab01 for values from ('2020-03-01') to ('2020-04-01');
create table ptab01_202004 partition of ptab01 for values from ('2020-04-01') to ('2020-05-01');
create table ptab01_202005 partition of ptab01 for values from ('2020-05-01') to ('2020-06-01');
insert into ptab01 select extract(epoch from seq), seq from generate_series('2020-01-01'::timestamptz, '2020-05-31 23:59:59'::timestamptz, interval '10 seconds') as seq;
postgres=# \d+ ptab01
Partitioned table "public.ptab01"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+--------------------------+-----------+----------+---------+---------+--------------+-------------
id | integer | | not null | | plain | |
tm | timestamp with time zone | | not null | | plain | |
Partition key: RANGE (tm)
Partitions: ptab01_202001 FOR VALUES FROM ('2020-01-01 00:00:00+08') TO ('2020-02-01 00:00:00+08'),
ptab01_202002 FOR VALUES FROM ('2020-02-01 00:00:00+08') TO ('2020-03-01 00:00:00+08'),
ptab01_202003 FOR VALUES FROM ('2020-03-01 00:00:00+08') TO ('2020-04-01 00:00:00+08'),
ptab01_202004 FOR VALUES FROM ('2020-04-01 00:00:00+08') TO ('2020-05-01 00:00:00+08'),
ptab01_202005 FOR VALUES FROM ('2020-05-01 00:00:00+08') TO ('2020-06-01 00:00:00+08')
postgres=# insert into ptab01 values(1,'2020-06-01');
ERROR: no partition of relation "ptab01" found for row
DETAIL: Partition key of the failing row contains (tm) = (2020-06-01 00:00:00+08).
创建默认分区,其实根据默认分区的约束就可以看明白了
postgres=# create table ptab01_default partition of ptab01 default ;
CREATE TABLE
postgres=# insert into ptab01 values(1,'2020-06-01');
INSERT 0 1
postgres=# select * from ptab01_default ;
id | tm
----+------------------------
1 | 2020-06-01 00:00:00+08
(1 row)
postgres=# \d+ ptab01_default
Table "public.ptab01_default"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------+---------+-------------+--------------+-------------
id | integer | | not null | | plain | | |
tm | timestamp with time zone | | not null | | plain | | |
Partition of: ptab01 DEFAULT
Partition constraint: (NOT ((tm IS NOT NULL) AND (((tm >= '2020-01-01 00:00:00+08'::timestamp with time zone) AND (tm < '2020-02-01 00:00:00+08'::timestamp with time zone)) OR ((tm >= '2020-02-01 00:00:00+08'::timestamp with time zone) AND (tm < '2020-03-01 00:00:00+08'::timestamp with time zone)) OR ((tm >= '2020-03-01 00:00:00+08'::timestamp with time zone) AND (tm < '2020-04-01 00:00:00+08'::timestamp with time zone)) OR ((tm >= '2020-04-01 00:00:00+08'::timestamp with time zone) AND (tm < '2020-05-01 00:00:00+08'::timestamp with time zone)) OR ((tm >= '2020-05-01 00:00:00+08'::timestamp with time zone) AND (tm < '2020-06-01 00:00:00+08'::timestamp with time zone)))))
Access method: heap
那么假如这个时候我想给表添加分区该怎么做呢?
postgres=# create table ptab01_202006(id int not null,tm timestamptz not null);
CREATE TABLE
postgres=# alter table ptab01 attach partition ptab01_202006 for values from ('2020-06-01') to ('2020-07-01');
ERROR: updated partition constraint for default partition "ptab01_default" would be violated by some row
需要先解绑默认分区
postgres=# alter table ptab01 detach partition ptab01_default ; ---解绑默认分区
ALTER TABLE
postgres=# alter table ptab01 attach partition ptab01_202006 for values from ('2020-06-01') to ('2020-07-01');
ALTER TABLE
postgres=# alter table ptab01 attach partition ptab01_default default ; ---重新挂载会报错
ERROR: partition constraint of relation "ptab01_default" is violated by some row
postgres=# insert into ptab01_202006 select * from ptab01_default ; ---需要先将数据挪到新的子表
INSERT 0 1
postgres=# delete from ptab01_default ; ---删除默认分区里的值
DELETE 1
postgres=# alter table ptab01 attach partition ptab01_default default ;
ALTER TABLE
postgres=# \d+ ptab01
Partitioned table "public.ptab01"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------+---------+-------------+--------------+-------------
id | integer | | not null | | plain | | |
tm | timestamp with time zone | | not null | | plain | | |
Partition key: RANGE (tm)
Partitions: ptab01_202001 FOR VALUES FROM ('2020-01-01 00:00:00+08') TO ('2020-02-01 00:00:00+08'),
ptab01_202002 FOR VALUES FROM ('2020-02-01 00:00:00+08') TO ('2020-03-01 00:00:00+08'),
ptab01_202003 FOR VALUES FROM ('2020-03-01 00:00:00+08') TO ('2020-04-01 00:00:00+08'),
ptab01_202004 FOR VALUES FROM ('2020-04-01 00:00:00+08') TO ('2020-05-01 00:00:00+08'),
ptab01_202005 FOR VALUES FROM ('2020-05-01 00:00:00+08') TO ('2020-06-01 00:00:00+08'),
ptab01_202006 FOR VALUES FROM ('2020-06-01 00:00:00+08') TO ('2020-07-01 00:00:00+08'),
ptab01_default DEFAULT
pathman分区表转化为原生分区表
https://github.com/digoal/blog/blob/master/201911/20191113_01.md
insert into select
pg_pathman有一个很方便的点,就是支持迁移数据,partition_table_concurrently
postgres=# create table test_partition(id int,info text,t_time timestamp not null);
CREATE TABLE
postgres=# insert into test_partition SELECT random()*(25-10)+10,'test',generate_series('2020-01-01','2021-12-01',interval '60 min') ;
INSERT 0 16801
postgres=# \dt+ test_partition
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
--------+----------------+-------+----------+-------------+--------+-------------
public | test_partition | table | postgres | permanent | 896 kB |
(1 row)
postgres=# select
create_range_partitions('test_partition'::regclass,
't_time',
'2020-01-01'::timestamp,
interval '1 month',
24,
false) ;
create_range_partitions
-------------------------
24
(1 row)
迁移数据
postgres=# select partition_table_concurrently('test_partition'::regclass,
postgres(# 10000,
postgres(# 1.0);
NOTICE: worker started, you can stop it with the following command: select public.stop_concurrent_part_task('test_partition');
partition_table_concurrently
------------------------------
(1 row)
postgres=# select count(*) from only test_partition_1;
count
-------
744
(1 row)
postgres=# select count(*) from only test_partition; ---主表中数据已经没有
count
-------
0
(1 row)
至此,pathman的分区建完了,再建一个原生分区,使用拼接SQL的方式创建原生分区
postgres=# create table native_partition (like test_partition including all) partition by range(t_time);
CREATE TABLE
postgres=# \d+ native_partition
Partitioned table "public.native_partition"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
info | text | | | | extended | |
t_time | timestamp without time zone | | not null | | plain | |
Partition key: RANGE (t_time)
Number of partitions: 0
postgres=# SELECT 'CREATE TABLE native_partition_' || p_month || ' PARTITION of native_partition FOR VALUES FROM (''2020-'||p_month||'-01'') TO (''2020-'||p_month||'-01''::timestamp + interval ''1 month'');' FROM generate_series(1,12) as p_month ;
?column?
--------------------------------------------------------------------------------------------------------------------------------------------------
CREATE TABLE native_partition_1 PARTITION of native_partition FOR VALUES FROM ('2020-1-01') TO ('2020-1-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_2 PARTITION of native_partition FOR VALUES FROM ('2020-2-01') TO ('2020-2-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_3 PARTITION of native_partition FOR VALUES FROM ('2020-3-01') TO ('2020-3-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_4 PARTITION of native_partition FOR VALUES FROM ('2020-4-01') TO ('2020-4-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_5 PARTITION of native_partition FOR VALUES FROM ('2020-5-01') TO ('2020-5-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_6 PARTITION of native_partition FOR VALUES FROM ('2020-6-01') TO ('2020-6-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_7 PARTITION of native_partition FOR VALUES FROM ('2020-7-01') TO ('2020-7-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_8 PARTITION of native_partition FOR VALUES FROM ('2020-8-01') TO ('2020-8-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_9 PARTITION of native_partition FOR VALUES FROM ('2020-9-01') TO ('2020-9-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_10 PARTITION of native_partition FOR VALUES FROM ('2020-10-01') TO ('2020-10-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_11 PARTITION of native_partition FOR VALUES FROM ('2020-11-01') TO ('2020-11-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_12 PARTITION of native_partition FOR VALUES FROM ('2020-12-01') TO ('2020-12-01'::timestamp + interval '1 month');
(12 rows)
postgres=# SELECT 'CREATE TABLE native_partition_' || p_month || ' PARTITION of native_partition FOR VALUES FROM (''2021-'||p_month-12||'-01'') TO (''2021-'||p_month-12||'-01''::timestamp + interval ''1 month'');' FROM generate_series(13,24) as p_month ;
?column?
--------------------------------------------------------------------------------------------------------------------------------------------------
CREATE TABLE native_partition_13 PARTITION of native_partition FOR VALUES FROM ('2021-1-01') TO ('2021-1-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_14 PARTITION of native_partition FOR VALUES FROM ('2021-2-01') TO ('2021-2-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_15 PARTITION of native_partition FOR VALUES FROM ('2021-3-01') TO ('2021-3-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_16 PARTITION of native_partition FOR VALUES FROM ('2021-4-01') TO ('2021-4-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_17 PARTITION of native_partition FOR VALUES FROM ('2021-5-01') TO ('2021-5-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_18 PARTITION of native_partition FOR VALUES FROM ('2021-6-01') TO ('2021-6-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_19 PARTITION of native_partition FOR VALUES FROM ('2021-7-01') TO ('2021-7-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_20 PARTITION of native_partition FOR VALUES FROM ('2021-8-01') TO ('2021-8-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_21 PARTITION of native_partition FOR VALUES FROM ('2021-9-01') TO ('2021-9-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_22 PARTITION of native_partition FOR VALUES FROM ('2021-10-01') TO ('2021-10-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_23 PARTITION of native_partition FOR VALUES FROM ('2021-11-01') TO ('2021-11-01'::timestamp + interval '1 month');
CREATE TABLE native_partition_24 PARTITION of native_partition FOR VALUES FROM ('2021-12-01') TO ('2021-12-01'::timestamp + interval '1 month');
(12 rows)
postgres=# insert into native_partition select * from test_partition;
INSERT 0 16801
postgres=# select count(*) from only native_partition;
count
-------
0
(1 row)
postgres=# select count(*) from only test_partition;
count
-------
0
(1 row)
postgres=# select count(*) from test_partition_1;
count
-------
744
(1 row)
postgres=# select count(*) from native_partition_1;
count
-------
744
(1 row)
postgres=# select count(*) from test_partition_10;
count
-------
744
(1 row)
postgres=# select count(*) from native_partition_10;
count
-------
744
(1 row)
detach + attach
postgres=# create table nat_partition(like test_partition including all) partition by range(t_time);
CREATE TABLE
postgres=# \d+ nat_partition
Partitioned table "public.nat_partition"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
info | text | | | | extended | |
t_time | timestamp without time zone | | not null | | plain | |
Partition key: RANGE (t_time)
Number of partitions: 0
postgres=# select 'select detach_range_partition(''test_partition_' || p_month || ''')'||';' from generate_series(1,12) as p_month; ---将分区从主表的继承关系中删除,不删数据,只删除继承关系和约束
?column?
-----------------------------------------------------
select detach_range_partition('test_partition_1');
select detach_range_partition('test_partition_2');
select detach_range_partition('test_partition_3');
select detach_range_partition('test_partition_4');
select detach_range_partition('test_partition_5');
select detach_range_partition('test_partition_6');
select detach_range_partition('test_partition_7');
select detach_range_partition('test_partition_8');
select detach_range_partition('test_partition_9');
select detach_range_partition('test_partition_10');
select detach_range_partition('test_partition_11');
select detach_range_partition('test_partition_12');
(12 rows)
...
postgres=# select detach_range_partition('test_partition_12');
detach_range_partition
------------------------
test_partition_12
(1 row)
postgres=# select 'select detach_range_partition(''test_partition_' || p_month || ''')'||';' from generate_series(13,24) as p_month;
?column?
-----------------------------------------------------
select detach_range_partition('test_partition_13');
select detach_range_partition('test_partition_14');
select detach_range_partition('test_partition_15');
select detach_range_partition('test_partition_16');
select detach_range_partition('test_partition_17');
select detach_range_partition('test_partition_18');
select detach_range_partition('test_partition_19');
select detach_range_partition('test_partition_20');
select detach_range_partition('test_partition_21');
select detach_range_partition('test_partition_22');
select detach_range_partition('test_partition_23');
select detach_range_partition('test_partition_24');
(12 rows)
postgres=# select disable_pathman_for('test_partition'); ---针对单个分区主表禁用pg_pathman
disable_pathman_for
---------------------
(1 row)
给原来的pathman分区表添加约束
postgres=# SELECT 'alter table test_partition_' || p_month || ' add constraint test_partition_con' || p_month || ' check (t_time >= ''2020-'||p_month||'-01'' and t_time < ''2021-'||p_month||'-01''::timestamp + interval ''1 month'');' FROM generate_series(1,12) as p_month;
?column?
--------------------------------------------------------------------------------------------------------------------------------------------------------------
alter table test_partition_1 add constraint test_partition_con1 check (t_time >= '2020-1-01' and t_time < '2021-1-01'::timestamp + interval '1 month');
alter table test_partition_2 add constraint test_partition_con2 check (t_time >= '2020-2-01' and t_time < '2021-2-01'::timestamp + interval '1 month');
alter table test_partition_3 add constraint test_partition_con3 check (t_time >= '2020-3-01' and t_time < '2021-3-01'::timestamp + interval '1 month');
alter table test_partition_4 add constraint test_partition_con4 check (t_time >= '2020-4-01' and t_time < '2021-4-01'::timestamp + interval '1 month');
alter table test_partition_5 add constraint test_partition_con5 check (t_time >= '2020-5-01' and t_time < '2021-5-01'::timestamp + interval '1 month');
alter table test_partition_6 add constraint test_partition_con6 check (t_time >= '2020-6-01' and t_time < '2021-6-01'::timestamp + interval '1 month');
alter table test_partition_7 add constraint test_partition_con7 check (t_time >= '2020-7-01' and t_time < '2021-7-01'::timestamp + interval '1 month');
alter table test_partition_8 add constraint test_partition_con8 check (t_time >= '2020-8-01' and t_time < '2021-8-01'::timestamp + interval '1 month');
alter table test_partition_9 add constraint test_partition_con9 check (t_time >= '2020-9-01' and t_time < '2021-9-01'::timestamp + interval '1 month');
alter table test_partition_10 add constraint test_partition_con10 check (t_time >= '2020-10-01' and t_time < '2021-10-01'::timestamp + interval '1 month');
alter table test_partition_11 add constraint test_partition_con11 check (t_time >= '2020-11-01' and t_time < '2021-11-01'::timestamp + interval '1 month');
alter table test_partition_12 add constraint test_partition_con12 check (t_time >= '2020-12-01' and t_time < '2021-12-01'::timestamp + interval '1 month');
(12 rows)
postgres=# SELECT 'alter table test_partition_' || p_month || ' add constraint test_partition_con' || p_month || ' check (t_time >= ''2021-'||p_month-12||'-01'' and t_time < ''2021-'||p_month-12||'-01''::timestamp + interval ''1 month'');' FROM generate_series(13,24) as p_month;
?column?
--------------------------------------------------------------------------------------------------------------------------------------------------------------
alter table test_partition_13 add constraint test_partition_con13 check (t_time >= '2021-1-01' and t_time < '2021-1-01'::timestamp + interval '1 month');
alter table test_partition_14 add constraint test_partition_con14 check (t_time >= '2021-2-01' and t_time < '2021-2-01'::timestamp + interval '1 month');
alter table test_partition_15 add constraint test_partition_con15 check (t_time >= '2021-3-01' and t_time < '2021-3-01'::timestamp + interval '1 month');
alter table test_partition_16 add constraint test_partition_con16 check (t_time >= '2021-4-01' and t_time < '2021-4-01'::timestamp + interval '1 month');
alter table test_partition_17 add constraint test_partition_con17 check (t_time >= '2021-5-01' and t_time < '2021-5-01'::timestamp + interval '1 month');
alter table test_partition_18 add constraint test_partition_con18 check (t_time >= '2021-6-01' and t_time < '2021-6-01'::timestamp + interval '1 month');
alter table test_partition_19 add constraint test_partition_con19 check (t_time >= '2021-7-01' and t_time < '2021-7-01'::timestamp + interval '1 month');
alter table test_partition_20 add constraint test_partition_con20 check (t_time >= '2021-8-01' and t_time < '2021-8-01'::timestamp + interval '1 month');
alter table test_partition_21 add constraint test_partition_con21 check (t_time >= '2021-9-01' and t_time < '2021-9-01'::timestamp + interval '1 month');
alter table test_partition_22 add constraint test_partition_con22 check (t_time >= '2021-10-01' and t_time < '2021-10-01'::timestamp + interval '1 month');
alter table test_partition_23 add constraint test_partition_con23 check (t_time >= '2021-11-01' and t_time < '2021-11-01'::timestamp + interval '1 month');
alter table test_partition_24 add constraint test_partition_con24 check (t_time >= '2021-12-01' and t_time < '2021-12-01'::timestamp + interval '1 month');
(12 rows)
参考官方手册
•
ADD table_constraint
[ NOT VALID ]这种形式使用和CREATE TABLE相同的语法外加
NOT VALID
选项为一个表增加一个新的约束,该选项 当前只被允许用于外键和 CHECK 约束。通常,此窗体将导致对表进行扫描,以验证表中的所有现有行是否满足新约束。但是如果使用了NOT VALID
选项 ,则跳过此可能很漫长的扫描。该约束仍将被强制到后续的插入和删除上(也就是说,在外键的情况下如果在被引用表中没有一个匹配的行,操作会失败;或者如果新行不匹配指定的检查条件,操作也会失败)。但是数据库不会假定约束对该表中的所有行都成立,直到通过使用VALIDATE CONSTRAINT
选项对它进行验证。参见下述 Notes 以了解关于使用NOT VALID
选项的更多信息。除了接收约束的表上的锁外,外键约束的增加要求在被引用表上的一个SHARE ROW EXCLUSIVE
锁。当唯一或者主键约束被添加到分区表时,会有额外的限制,请参考CREATE TABLE。此外,当前分区表上的外键约束不能被声明为NOT VALID
。在使用
ADD COLUMN
增加一列并且指定了一个非易失性DEFAULT
时,默认值会在该语句执行时计算并且结果会被保存在表的元数据中。这个值将被用于所有现有行的该列。如果没有指定DEFAULT
,则使用NULL。在两种情况下都不需要重写表。增加一个带有非易失性
DEFAULT
子句的列或者更改一个现有列的类型将 要求重写整个表及其索引。在更改一个现有列的类型时有一种例外:如果USING
子句不更改列的内容并且旧类型在二进制上与新类型可 强制转换或者是新类型上的一个未约束域,则不需要重写表。但是受影响列上 的任何索引仍必须被重建。对于一个大型表,表和/或索引重建可能会消耗相当多的时间, 并且会临时要求差不多两倍的磁盘空间。增加一个
CHECK
或者NOT NULL
约束要求扫描表以验证现有行符合该约束,但是不要求一次表重写。类似地,在挂接一个新分区时,它需要被扫描以验证现有行满足该分区约束。
提供在一个
ALTER TABLE
中指定多个更改的选项的主要原因就是多次表扫描或者重写可以因此被整合成一次。扫描大型表以验证新的外键或检查约束可能需要很长时间,并且对表的其他更新将锁定,直到
ALTER TABLE ADD CONSTRAINT
命令被提交。NOT VALID
约束选项的主要目的是减少对并发更新添加约束的影响。使用NOT VALID
,ADD CONSTRAINT
命令不扫描表,可以立即提交。在之后,VALIDATE CONSTRAINT
命令会被发出以验证现有行是否满足约束。验证步骤不需要锁定并发更新,因为它知道其他事务将强制执行它们插入或更新的行的约束,只有预先存在的行需要检查。因此,验证在被更改的表上仅获得一个SHARE UPDATE EXCLUSIVE
锁。(如果约束是外键,则ROW SHARE
锁也需要约束的表引用。) 除了改进并发性外,在已知该表包含预先存在的违规行为的情况下使用NOT VALID
和VALIDATE CONSTRAINT
也能有作用。一旦约束就位,就不能再插入新的违规,并且现有问题可以在空闲时纠正,直到VALIDATE CONSTRAINT
最终完成。
直接给表添加约束会添加AccessExclusiveLock锁,假如使用not valid,可以无需很长时间的AccessExclusiveLock锁(扫描所有的元组),后续再执行VALIDATE CONSTRAINT只需要ShareUpdateExclusive锁。
postgres=# SELECT 'alter table test_partition_' || p_month || ' add constraint test_partition_con' || p_month || ' check (t_time >= ''2020-'||p_month||'-01'' and t_time < ''2021-'||p_month||'-01''::timestamp + interval ''1 month'') not valid; ' FROM generate_series(1,12) as p_month ;
?column?
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
alter table test_partition_1 add constraint test_partition_con1 check (t_time >= '2020-1-01' and t_time < '2021-1-01'::timestamp + interval '1 month') not valid;
alter table test_partition_2 add constraint test_partition_con2 check (t_time >= '2020-2-01' and t_time < '2021-2-01'::timestamp + interval '1 month') not valid;
alter table test_partition_3 add constraint test_partition_con3 check (t_time >= '2020-3-01' and t_time < '2021-3-01'::timestamp + interval '1 month') not valid;
alter table test_partition_4 add constraint test_partition_con4 check (t_time >= '2020-4-01' and t_time < '2021-4-01'::timestamp + interval '1 month') not valid;
alter table test_partition_5 add constraint test_partition_con5 check (t_time >= '2020-5-01' and t_time < '2021-5-01'::timestamp + interval '1 month') not valid;
alter table test_partition_6 add constraint test_partition_con6 check (t_time >= '2020-6-01' and t_time < '2021-6-01'::timestamp + interval '1 month') not valid;
alter table test_partition_7 add constraint test_partition_con7 check (t_time >= '2020-7-01' and t_time < '2021-7-01'::timestamp + interval '1 month') not valid;
alter table test_partition_8 add constraint test_partition_con8 check (t_time >= '2020-8-01' and t_time < '2021-8-01'::timestamp + interval '1 month') not valid;
alter table test_partition_9 add constraint test_partition_con9 check (t_time >= '2020-9-01' and t_time < '2021-9-01'::timestamp + interval '1 month') not valid;
alter table test_partition_10 add constraint test_partition_con10 check (t_time >= '2020-10-01' and t_time < '2021-10-01'::timestamp + interval '1 month') not valid;
alter table test_partition_11 add constraint test_partition_con11 check (t_time >= '2020-11-01' and t_time < '2021-11-01'::timestamp + interval '1 month') not valid;
alter table test_partition_12 add constraint test_partition_con12 check (t_time >= '2020-12-01' and t_time < '2021-12-01'::timestamp + interval '1 month') not valid;
(12 rows)
postgres=# SELECT 'alter table test_partition_' || p_month || ' add constraint test_partition_con' || p_month || ' check (t_time >= ''2021-'||p_month-12||'-01'' and t_time < ''2021-'||p_month-12||'-01''::timestamp + interval ''1 month'') not valid; ' FROM generate_series(13,24) as p_month ;
?column?
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
alter table test_partition_13 add constraint test_partition_con13 check (t_time >= '2021-1-01' and t_time < '2021-1-01'::timestamp + interval '1 month') not valid;
alter table test_partition_14 add constraint test_partition_con14 check (t_time >= '2021-2-01' and t_time < '2021-2-01'::timestamp + interval '1 month') not valid;
alter table test_partition_15 add constraint test_partition_con15 check (t_time >= '2021-3-01' and t_time < '2021-3-01'::timestamp + interval '1 month') not valid;
alter table test_partition_16 add constraint test_partition_con16 check (t_time >= '2021-4-01' and t_time < '2021-4-01'::timestamp + interval '1 month') not valid;
alter table test_partition_17 add constraint test_partition_con17 check (t_time >= '2021-5-01' and t_time < '2021-5-01'::timestamp + interval '1 month') not valid;
alter table test_partition_18 add constraint test_partition_con18 check (t_time >= '2021-6-01' and t_time < '2021-6-01'::timestamp + interval '1 month') not valid;
alter table test_partition_19 add constraint test_partition_con19 check (t_time >= '2021-7-01' and t_time < '2021-7-01'::timestamp + interval '1 month') not valid;
alter table test_partition_20 add constraint test_partition_con20 check (t_time >= '2021-8-01' and t_time < '2021-8-01'::timestamp + interval '1 month') not valid;
alter table test_partition_21 add constraint test_partition_con21 check (t_time >= '2021-9-01' and t_time < '2021-9-01'::timestamp + interval '1 month') not valid;
alter table test_partition_22 add constraint test_partition_con22 check (t_time >= '2021-10-01' and t_time < '2021-10-01'::timestamp + interval '1 month') not valid;
alter table test_partition_23 add constraint test_partition_con23 check (t_time >= '2021-11-01' and t_time < '2021-11-01'::timestamp + interval '1 month') not valid;
alter table test_partition_24 add constraint test_partition_con24 check (t_time >= '2021-12-01' and t_time < '2021-12-01'::timestamp + interval '1 month') not valid;
(12 rows)
最后执行attach的动作
postgres=# select 'alter table test_partition_' || p_month || ' validate constraint test_partition_con'||p_month||';' from generate_series(1,12) p_month ;
?column?
-------------------------------------------------------------------------
alter table test_partition_1 validate constraint test_partition_con1;
alter table test_partition_2 validate constraint test_partition_con2;
alter table test_partition_3 validate constraint test_partition_con3;
alter table test_partition_4 validate constraint test_partition_con4;
alter table test_partition_5 validate constraint test_partition_con5;
alter table test_partition_6 validate constraint test_partition_con6;
alter table test_partition_7 validate constraint test_partition_con7;
alter table test_partition_8 validate constraint test_partition_con8;
alter table test_partition_9 validate constraint test_partition_con9;
alter table test_partition_10 validate constraint test_partition_con10;
alter table test_partition_11 validate constraint test_partition_con11;
alter table test_partition_12 validate constraint test_partition_con12;
(12 rows)
postgres=# SELECT 'alter table nat_partition attach partition test_partition_' || p_month || ' FOR VALUES FROM (''2020-'||p_month||'-01'') TO (''2020-'||p_month||'-01''::timestamp + interval ''1 month'');' FROM generate_series(1,12) as p_month ;
?column?
------------------------------------------------------------------------------------------------------------------------------------------------
alter table nat_partition attach partition test_partition_1 FOR VALUES FROM ('2020-1-01') TO ('2020-1-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_2 FOR VALUES FROM ('2020-2-01') TO ('2020-2-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_3 FOR VALUES FROM ('2020-3-01') TO ('2020-3-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_4 FOR VALUES FROM ('2020-4-01') TO ('2020-4-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_5 FOR VALUES FROM ('2020-5-01') TO ('2020-5-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_6 FOR VALUES FROM ('2020-6-01') TO ('2020-6-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_7 FOR VALUES FROM ('2020-7-01') TO ('2020-7-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_8 FOR VALUES FROM ('2020-8-01') TO ('2020-8-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_9 FOR VALUES FROM ('2020-9-01') TO ('2020-9-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_10 FOR VALUES FROM ('2020-10-01') TO ('2020-10-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_11 FOR VALUES FROM ('2020-11-01') TO ('2020-11-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_12 FOR VALUES FROM ('2020-12-01') TO ('2020-12-01'::timestamp + interval '1 month');
(12 rows)
postgres=# alter table nat_partition attach partition test_partition_1 FOR VALUES FROM ('2020-1-01') TO ('2020-1-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_2 FOR VALUES FROM ('2020-2-01') TO ('2020-2-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_3 FOR VALUES FROM ('2020-3-01') TO ('2020-3-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_4 FOR VALUES FROM ('2020-4-01') TO ('2020-4-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_5 FOR VALUES FROM ('2020-5-01') TO ('2020-5-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_6 FOR VALUES FROM ('2020-6-01') TO ('2020-6-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_7 FOR VALUES FROM ('2020-7-01') TO ('2020-7-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_8 FOR VALUES FROM ('2020-8-01') TO ('2020-8-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_9 FOR VALUES FROM ('2020-9-01') TO ('2020-9-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_10 FOR VALUES FROM ('2020-10-01') TO ('2020-10-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_11 FOR VALUES FROM ('2020-11-01') TO ('2020-11-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_12 FOR VALUES FROM ('2020-12-01') TO ('2020-12-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# SELECT 'alter table nat_partition attach partition test_partition_' || p_month || ' FOR VALUES FROM (''2021-'||p_month-12||'-01'') TO (''2021-'||p_month-12||'-01''::timestamp + interval ''1 month'');' FROM generate_series(13,24) as p_month ;
?column?
------------------------------------------------------------------------------------------------------------------------------------------------
alter table nat_partition attach partition test_partition_13 FOR VALUES FROM ('2021-1-01') TO ('2021-1-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_14 FOR VALUES FROM ('2021-2-01') TO ('2021-2-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_15 FOR VALUES FROM ('2021-3-01') TO ('2021-3-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_16 FOR VALUES FROM ('2021-4-01') TO ('2021-4-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_17 FOR VALUES FROM ('2021-5-01') TO ('2021-5-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_18 FOR VALUES FROM ('2021-6-01') TO ('2021-6-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_19 FOR VALUES FROM ('2021-7-01') TO ('2021-7-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_20 FOR VALUES FROM ('2021-8-01') TO ('2021-8-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_21 FOR VALUES FROM ('2021-9-01') TO ('2021-9-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_22 FOR VALUES FROM ('2021-10-01') TO ('2021-10-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_23 FOR VALUES FROM ('2021-11-01') TO ('2021-11-01'::timestamp + interval '1 month');
alter table nat_partition attach partition test_partition_24 FOR VALUES FROM ('2021-12-01') TO ('2021-12-01'::timestamp + interval '1 month');
(12 rows)
postgres=# alter table nat_partition attach partition test_partition_13 FOR VALUES FROM ('2021-1-01') TO ('2021-1-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_14 FOR VALUES FROM ('2021-2-01') TO ('2021-2-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_15 FOR VALUES FROM ('2021-3-01') TO ('2021-3-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_16 FOR VALUES FROM ('2021-4-01') TO ('2021-4-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_17 FOR VALUES FROM ('2021-5-01') TO ('2021-5-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_18 FOR VALUES FROM ('2021-6-01') TO ('2021-6-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_19 FOR VALUES FROM ('2021-7-01') TO ('2021-7-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_20 FOR VALUES FROM ('2021-8-01') TO ('2021-8-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_21 FOR VALUES FROM ('2021-9-01') TO ('2021-9-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_22 FOR VALUES FROM ('2021-10-01') TO ('2021-10-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_23 FOR VALUES FROM ('2021-11-01') TO ('2021-11-01'::timestamp + interval '1 month');
ALTER TABLE
postgres=# alter table nat_partition attach partition test_partition_24 FOR VALUES FROM ('2021-12-01') TO ('2021-12-01'::timestamp + interval '1 month');
ALTER TABLE
至此,分区表改造完成
postgres=# \d+ nat_partition
Partitioned table "public.nat_partition"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
info | text | | | | extended | |
t_time | timestamp without time zone | | not null | | plain | |
Partition key: RANGE (t_time)
Partitions: test_partition_1 FOR VALUES FROM ('2020-01-01 00:00:00') TO ('2020-02-01 00:00:00'),
test_partition_10 FOR VALUES FROM ('2020-10-01 00:00:00') TO ('2020-11-01 00:00:00'),
test_partition_11 FOR VALUES FROM ('2020-11-01 00:00:00') TO ('2020-12-01 00:00:00'),
test_partition_12 FOR VALUES FROM ('2020-12-01 00:00:00') TO ('2021-01-01 00:00:00'),
test_partition_13 FOR VALUES FROM ('2021-01-01 00:00:00') TO ('2021-02-01 00:00:00'),
test_partition_14 FOR VALUES FROM ('2021-02-01 00:00:00') TO ('2021-03-01 00:00:00'),
test_partition_15 FOR VALUES FROM ('2021-03-01 00:00:00') TO ('2021-04-01 00:00:00'),
test_partition_16 FOR VALUES FROM ('2021-04-01 00:00:00') TO ('2021-05-01 00:00:00'),
test_partition_17 FOR VALUES FROM ('2021-05-01 00:00:00') TO ('2021-06-01 00:00:00'),
test_partition_18 FOR VALUES FROM ('2021-06-01 00:00:00') TO ('2021-07-01 00:00:00'),
test_partition_19 FOR VALUES FROM ('2021-07-01 00:00:00') TO ('2021-08-01 00:00:00'),
test_partition_2 FOR VALUES FROM ('2020-02-01 00:00:00') TO ('2020-03-01 00:00:00'),
test_partition_20 FOR VALUES FROM ('2021-08-01 00:00:00') TO ('2021-09-01 00:00:00'),
test_partition_21 FOR VALUES FROM ('2021-09-01 00:00:00') TO ('2021-10-01 00:00:00'),
test_partition_22 FOR VALUES FROM ('2021-10-01 00:00:00') TO ('2021-11-01 00:00:00'),
test_partition_23 FOR VALUES FROM ('2021-11-01 00:00:00') TO ('2021-12-01 00:00:00'),
test_partition_24 FOR VALUES FROM ('2021-12-01 00:00:00') TO ('2022-01-01 00:00:00'),
test_partition_3 FOR VALUES FROM ('2020-03-01 00:00:00') TO ('2020-04-01 00:00:00'),
test_partition_4 FOR VALUES FROM ('2020-04-01 00:00:00') TO ('2020-05-01 00:00:00'),
test_partition_5 FOR VALUES FROM ('2020-05-01 00:00:00') TO ('2020-06-01 00:00:00'),
test_partition_6 FOR VALUES FROM ('2020-06-01 00:00:00') TO ('2020-07-01 00:00:00'),
test_partition_7 FOR VALUES FROM ('2020-07-01 00:00:00') TO ('2020-08-01 00:00:00'),
test_partition_8 FOR VALUES FROM ('2020-08-01 00:00:00') TO ('2020-09-01 00:00:00'),
test_partition_9 FOR VALUES FROM ('2020-09-01 00:00:00') TO ('2020-10-01 00:00:00')
实用SQL
统计父表和子表
postgres=# SELECT
nmsp_parent.nspname AS parent_schema ,
parent.relname AS parent ,
nmsp_child.nspname AS child ,
child.relname AS child_schema
FROM
pg_inherits JOIN pg_class parent
ON pg_inherits.inhparent = parent.oid JOIN pg_class child
ON pg_inherits.inhrelid = child.oid JOIN pg_namespace nmsp_parent
ON nmsp_parent.oid = parent.relnamespace JOIN pg_namespace nmsp_child
ON nmsp_child.oid = child.relnamespace
WHERE
parent.relname = 'nat_partition';
parent_schema | parent | child | child_schema
---------------+---------------+--------+-------------------
public | nat_partition | public | test_partition_1
public | nat_partition | public | test_partition_2
public | nat_partition | public | test_partition_3
public | nat_partition | public | test_partition_4
public | nat_partition | public | test_partition_5
public | nat_partition | public | test_partition_6
public | nat_partition | public | test_partition_7
public | nat_partition | public | test_partition_8
public | nat_partition | public | test_partition_9
public | nat_partition | public | test_partition_10
public | nat_partition | public | test_partition_11
public | nat_partition | public | test_partition_12
public | nat_partition | public | test_partition_13
public | nat_partition | public | test_partition_14
public | nat_partition | public | test_partition_15
public | nat_partition | public | test_partition_16
public | nat_partition | public | test_partition_17
public | nat_partition | public | test_partition_18
public | nat_partition | public | test_partition_19
public | nat_partition | public | test_partition_20
public | nat_partition | public | test_partition_21
public | nat_partition | public | test_partition_22
public | nat_partition | public | test_partition_23
public | nat_partition | public | test_partition_24
(24 rows)
统计子表行数
postgres=# select tableoid::regclass as relname,count(*) from nat_partition group by 1 order by relname;
relname | count
-------------------+-------
test_partition_1 | 744
test_partition_2 | 696
test_partition_3 | 744
test_partition_4 | 720
test_partition_5 | 744
test_partition_6 | 720
test_partition_7 | 744
test_partition_8 | 744
test_partition_9 | 720
test_partition_10 | 744
test_partition_11 | 720
test_partition_12 | 744
test_partition_13 | 744
test_partition_14 | 672
test_partition_15 | 744
test_partition_16 | 720
test_partition_17 | 744
test_partition_18 | 720
test_partition_19 | 744
test_partition_20 | 744
test_partition_21 | 720
test_partition_22 | 744
test_partition_23 | 720
test_partition_24 | 1
(24 rows)
postgres=# select tableoid::regclass as relname,count(*) from native_partition group by 1 order by relname;
relname | count
---------------------+-------
native_partition_1 | 744
native_partition_2 | 696
native_partition_3 | 744
native_partition_4 | 720
native_partition_5 | 744
native_partition_6 | 720
native_partition_7 | 744
native_partition_8 | 744
native_partition_9 | 720
native_partition_10 | 744
native_partition_11 | 720
native_partition_12 | 744
native_partition_13 | 744
native_partition_14 | 672
native_partition_15 | 744
native_partition_16 | 720
native_partition_17 | 744
native_partition_18 | 720
native_partition_19 | 744
native_partition_20 | 744
native_partition_21 | 720
native_partition_22 | 744
native_partition_23 | 720
native_partition_24 | 1
(24 rows)
获取分区表大小
postgres=# SELECT
pi.inhparent::regclass AS parent_table_name,
pg_size_pretty(sum(pg_total_relation_size(psu.relid))) AS total,
pg_size_pretty(sum(pg_relation_size(psu.relid))) AS internal,
pg_size_pretty(sum(pg_table_size(psu.relid) - pg_relation_size(psu.relid))) AS external, -- toast
pg_size_pretty(sum(pg_indexes_size(psu.relid))) AS indexes
FROM pg_catalog.pg_statio_user_tables psu
JOIN pg_class pc ON psu.relname = pc.relname
JOIN pg_database pd ON pc.relowner = pd.datdba
JOIN pg_inherits pi ON pi.inhrelid = pc.oid
WHERE pd.datname = 'postgres'
GROUP BY pi.inhparent
ORDER BY sum(pg_total_relation_size(psu.relid)) DESC;
parent_table_name | total | internal | external | indexes
-------------------+-------+----------+----------+---------
ptab01 | 56 MB | 55 MB | 160 kB | 0 bytes
(1 row)
WITH RECURSIVE tables AS (
SELECT
c.oid AS parent,
c.oid AS relid,
1 AS level
FROM
pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_inherits AS i ON c.oid = i.inhrelid
-- p = partitioned table, r = normal table
WHERE
c.relkind IN ('p', 'r')
-- not having a parent table -> we only get the partition heads
AND i.inhrelid IS NULL
UNION ALL
SELECT
p.parent AS parent,
c.oid AS relid,
p.level + 1 AS level
FROM
tables AS p
LEFT JOIN pg_catalog.pg_inherits AS i ON p.relid = i.inhparent
LEFT JOIN pg_catalog.pg_class AS c ON c.oid = i.inhrelid
AND c.relispartition
WHERE
c.oid IS NOT NULL
)
SELECT
parent::regclass AS table_name,
array_agg(relid::REGCLASS) AS all_partitions,
pg_size_pretty(sum(pg_total_relation_size(relid))) AS pretty_total_size,
sum(pg_total_relation_size(relid)) AS total_size
FROM
tables
GROUP BY
parent
相关函数
参考 https://postgres.fun/20190716101600.html
•pg_partition_tree(regclass):返回分区表详细信息,例如分区名称、上一级分区名称、是否叶子结点、层级,层级 0 表示顶层父表。•pg_partition_ancestors(regclass):返回上层分区名称,包括本层分区名称。•pg_partition_root(regclass):返回顶层父表名称。•pg_get_partkeydef:获取分区类型•pg_get_partition_constraintdef:根据分区OID获取分区约束条件
postgres=# select pg_get_partkeydef('test'::regclass);
pg_get_partkeydef
-------------------
RANGE (n)
(1 row)
postgres=# select pg_get_partition_constraintdef('test_1'::regclass);
pg_get_partition_constraintdef
--------------------------------
((n IS NOT NULL) AND (n < 10))
(1 row)
如何降低锁的粒度
attach partition时的锁要求
1.PostgreSQL12之前的版本:在父表和被连接分区上都要加上Access Exclusive 排它锁2.PostgreSQL12以后的版本:在被连接分区和默认分区(如果存在)上加Access Exclusive排它锁,在父表只需加Share Update Exclusive共享锁
试想这么一个场景,假如没有默认分区,插入一条不满足现有分区策略的数据时,会报错,那么就得创建子分区,但是需要Access Exclusive的8级锁,会阻塞一切访问,假如表上有插入和查询就得等待。那么我们就可以这么操作:
1.使用CREATE TABLE ... PARTITION OF ...
创建分区需要对分区表使用ACCESS EXCLUSIVE
锁,这与对分区表的所有访问冲突。2.分两步创建新分区:CREATE TABLE inventory_items_42 ( LIKE inventory_items INCLUDING DEFAULTS INCLUDING CONSTRAINTS ); ALTER TABLE inventory_items ATTACH PARTITION inventory_items_42 FOR VALUES IN (42);
,它只需要父表上的SHARE UPDATE EXCLUSIVE
锁,不阻塞读写
小结
推荐阅读:https://github.com/digoal/blog/blob/master/202104/20210415_04.md
1.PostgreSQL v10使分区表的创建不再困难,由0到1的跃进,但功能只是一个Demo2.PostgreSQL v11使分区表的可管理性大大增强,大大方便运维和DBA3.PostgreSQL v12使分区表的性能大大增强,媲美pg_pathman,大于v12可以考虑使用原生分区,并且降低了attach partition时的锁要求,极大方便了开发4.PostgreSQL v13使分区表的Logical Decoding功能大大增强,方便逻辑订阅发布5.PostgreSQL v14小修小补
对于v12以前的版本,建议使用pg_pathman,12以后的版本,可以使用原生分区了。另外需要谨记的是,坏的分区方法和选择,比没有分区更加糟糕。
附一些性能测试Case:
参考
https://www.2ndquadrant.com/en/blog/partition-elimination-postgresql-11/
https://github.com/digoal/blog
https://www.slideshare.net/meMissEmerson/postgresql-decoding-partitions
https://postgres.fun
https://www.postgresql.org/docs/13/ddl-partitioning.html
http://www.postgres.cn/docs/13/ddl-partitioning.html
https://www.postgresql.org/about/featurematrix/
https://www.cnblogs.com/abclife/p/11648006.html
https://stackoverflow.com/questions/64307782/any-issue-with-enable-partitionwise-join-at-instance-level
https://www.cybertec-postgresql.com/en/pg_rewrite-postgresql-table-partitioning/
各位可以添加我个人微信,进群唠嗑~