查看原文
其他

PostgreSQL VACUUM 之深入浅出

The following article is from DBADaily Author Alvin


作者介绍 PROFILE

Alvin

2020年8月加入去哪儿网,主要负责公司 PostgreSQL 和 Oracle 的运维管理工作。

一、前言

VACUUM  是 PostgreSQL MVCC (Multiversion concurrency control) 实现的核心机制之一,是 PostgreSQL 正常运行的重要保证。

本文将通过实例演示 PostgreSQL 为什么需要做 VACUUM,及一步一步精准触发 AUTOVACUUM (理论与实践相互印证),最后会给出 VACUUM 相关参数简要分析及优化实战。

二、测试环境准备

以下测试是在 PostgreSQL 11 中进行。
通过以下 SQL 创建:
测试用户: alvin,普通用户,非 superuser
测试数据库: alvindb,owner 是 alvin
测试 schema: alvin,owner 也是 alvin
这里采用的是 user 与 schema 同名,结合默认的 search_path("$user", public),这样操作对象(table, sequence, etc.)时就不需要加 schema 前缀了。
postgres=# CREATE USER alvin WITH PASSWORD 'alvin';CREATE ROLEpostgres=# CREATE DATABASE alvindb OWNER alvin;CREATE DATABASEpostgres=# \c alvindbYou are now connected to database "alvindb" as user "postgres".alvindb=# CREATE SCHEMA alvin AUTHORIZATION alvin;CREATE SCHEMAalvindb=# \c alvindb alvinYou are now connected to database "alvindb" as user "alvin".alvindb=> SHOW search_path; search_path ----------------- "$user", public(1 row)

三、PostgreSQL 为什么需要做 VACUUM

这要从 PostgreSQL MVCC UPDATE/DELETE 实现讲起。

下面通过简单演示 PostgreSQL 中 UPDATE/DELETE 时底层数据变化,揭秘其 MVCC 设计的艺术。

为了方便看其底层数据,通过 superuser postgres 创建 extension pageinspect:

$ psql -d alvindb -U postgresalvindb=# CREATE EXTENSION IF NOT EXISTS pageinspect;CREATE EXTENSIONalvindb=# \dx pageinspect List of installed extensions Name | Version | Schema | Description -------------+---------+--------+------------------------------------------------------- pageinspect | 1.7 | public | inspect the contents of database pages at a low level(1 row)

首先,创建测试表

$ psql -d alvindb -U alvinalvindb=> CREATE TABLE tb_test_vacuum ( test_id BIGSERIAL PRIMARY KEY, test_num BIGINT);CREATE TABLE

插入 3 条测试数据

alvindb=> INSERT INTO tb_test_vacuum(test_num) SELECT gid FROM generate_series(1,3,1) gid;INSERT 0 3alvindb=> SELECT * FROM tb_test_vacuum ORDER BY 1 DESC LIMIT 5; test_id | test_num ---------+---------- 3 | 3 2 | 2 1 | 1(3 rows)

查看其底层数据。

alvindb=> SELECT * FROM heap_page_items(get_raw_page('alvin.tb_test_vacuum', 0)) LIMIT 10;ERROR: must be superuser to use raw functions

可以看到底层数据只有 superuser 才可以查看,这里另打开一个窗口,用 superuser 用户 postgres 查看。

psql -d alvindb -U postgresalvindb=# SELECT * FROM heap_page_items(get_raw_page('alvin.tb_test_vacuum', 0)) LIMIT 10;

这里 t_xmin 为其插入时 transaction id。

下面删除 2 条数据:

alvindb=> DELETE FROM tb_test_vacuum WHERE test_id = 2;DELETE 1alvindb=> DELETE FROM tb_test_vacuum WHERE test_id = 3;DELETE 1alvindb=> SELECT * FROM tb_test_vacuum ORDER BY 1 DESC LIMIT 5; test_id | test_num ---------+---------- 1 | 1(1 row)

此时在第二个窗口再看其底层数据

alvindb=> SELECT * FROM heap_page_items(get_raw_page('alvin.tb_test_vacuum', 0)) LIMIT 10;

这时你会发现,实际数据并未被删除。只是修改了 t_xmaxt_infomask2 和 t_infomaskt_xmax 为删除时的 transaction id,t_infomask2 和 t_infomask 为各种标志位,这里显示的是其二进制转换后的十进制。

为什么不直接物理删除数据呢?

主要是出于以下考虑:

这些被删除的数据可能还在被其他事务访问,所以不能直接物理删除。这就是所谓的 MVCC 中的 multi version,即多版本,不同事务访问的可能是不同版本的数据。transaction id 可以理解为版本号。其他事务可能还在访问 t_xmin 为 15400739 且 t_xmax 为 15400741 或 15400742 的数据。

为什么有的其他数据库 MVCC 实现底层数据就不是这样呢?

Oracle 中将要删除数据转移到了 UNDO tablespace 中,供其他事务访问,以实现 MVCC。

PostgreSQL 为什么这么实现呢?

大家可以想一下,“转移数据” 与 “改标志位”,哪个 cost 高呢?当然是 “改标志位” 既简单又高效了!可见 PostgreSQL 设计之巧妙。

另外,PostgreSQL 这样做还有一个好处。

Oracle DBA 都非常熟悉 ORA-01555: snapshot too old,其原因是 UNDO tablespace 大小毕竟是有限的,存储的老版本数据也是有限的,Oracle 中解决 snapshot too old 一个办法就是增大 UNDO tablespace。PostgreSQL 中这样保留老版本数据,可以说磁盘有多大,“UNDO tablespace” 就有多大,就不会出现类似 snapshot too old 这样的问题。

但凡事都有两面性。

PostgreSQL 中这样保留老版本数据有什么弊端呢?

老版本的数据是可能有其他事务需要访问,但随着时间的推移,这些事务终将结束,对应老版本的数据终将不被需要。这些无用的数据(即 dead tuple)会导致正常数据的访问变得很慢,且不断占用甚至耗尽磁盘空间,这就是 PostgreSQL 中的 Bloat ,即膨胀。

PostgreSQL 中的 bloat 问题如何解决呢?

就是 VACUUM。可以理解为“回收空间”。

现在对表 alvin.tb_test_vacuum 进行 VACUUM 操作。
alvindb=> VACUUM VERBOSE tb_test_vacuum;INFO: vacuuming "alvin.tb_test_vacuum"INFO: scanned index "tb_test_vacuum_pkey" to remove 2 row versionsDETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 sINFO: "tb_test_vacuum": removed 2 row versions in 1 pagesDETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 sINFO: index "tb_test_vacuum_pkey" now contains 1 row versions in 2 pagesDETAIL: 2 index row versions were removed.0 index pages have been deleted, 0 are currently reusable.CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.INFO: "tb_test_vacuum": found 2 removable, 1 nonremovable row versions in 1 out of 1 pagesDETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 15400744There were 0 unused item pointers.Skipped 0 pages due to buffer pins, 0 frozen pages.0 pages are entirely empty.CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.VACUUM

可以看到,VACUUM 不仅针对表数据,还包括索引。即不仅表数据可造成 Bloat (膨胀),索引也会。

pageinspect extension 除了可以用 heap_page_items 看底层数据,也可以通过 bt_page_items 看其索引底层数据。在此不再查看索引底层数据,感兴趣可以执行如下 function 自行测试。
SELECT * FROM bt_page_items('index_name', 1);

在第二个窗口重新查看表底层数据:

psql -d alvindb -U postgresalvindb=# SELECT * FROM heap_page_items(get_raw_page('alvin.tb_test_vacuum', 0)) LIMIT 10;

可以看到,老版本数据已被清除。此时回收的空间新插入的数据使用,但并未返回给操作系统。

可以看到,老版本数据已被清除。此时回收的空间已可以被新插入 (当然更准确的说是包括插入和更新在内的所有写入) 的数据所复用,但并未返回给操作系统。(注:PostgreSQL 14 中此种场景已可直接将空间返还给操作系统)

如何将回收的空间真正返回给操作系统呢?

就是 VACUUM FULL 操作:

alvindb=> VACUUM FULL VERBOSE tb_test_vacuum;INFO: vacuuming "alvin.tb_test_vacuum"INFO: "tb_test_vacuum": found 0 removable, 1 nonremovable row versions in 1 pagesDETAIL: 0 dead row versions cannot be removed yet.CPU: user: 0.01 s, system: 0.01 s, elapsed: 0.08 s.VACUUM

在第二个窗口查看表底层数据:

psql -d alvindb -U postgresalvindb=# SELECT * FROM heap_page_items(get_raw_page('alvin.tb_test_vacuum', 0)) LIMIT 10;

可以看到,老版本数据已彻底回收了。

但要注意,生产环境需要谨慎使用 VACUUM FULL,因为它将在表上加 ACCESS EXCLUSIVE 锁,即连 SELECT 也不可以。除非应用端可以计划不访问该表。

上面通过 DELETE 演示了为什么需要做 VACUUM。

那么 UPDATE 在 PostgreSQL 中是如何实现的呢?它会不会产生 Bloat (膨胀) 呢?

执行 UPDATE 操作如下:

alvindb=> UPDATE tb_test_vacuum SET test_num = 1 WHERE test_id = 1;UPDATE 1

在第二个窗口查看表底层数据:

psql -d alvindb -U postgresalvindb=# SELECT * FROM heap_page_items(get_raw_page('alvin.tb_test_vacuum', 0)) LIMIT 10;

可以看到,UPDATE 其实是 DELETE + INSERT。

为什么 PostgreSQL 如此实现 UPDATE 呢?

是因为 DELETE + INSERT 执行效率高?直接修改原数据不可以么?

因为老版本数据有可能还被其他事务需要!这是 MVCC 实现所需要的。

当然,相比 Oracle 中将老版本数据转移到 UNDO tablespace, DELETE + INSERT 中的  DELETE 减少了 I/O,因为其只修改了标志位而已。

那么只有 UPDATE 和 DELETE 会产生 Bloat (膨胀) 吗?INSERT 会吗?

INSERT 不是只插入数据吗?它怎么会产生 Bloat (膨胀) 呢?

接下来看下面的 case。

在事务中,ROLLBACK INSERT 的数据:

alvindb=> TRUNCATE tb_test_vacuum;TRUNCATE TABLEalvindb=> INSERT INTO tb_test_vacuum(test_num) SELECT gid FROM generate_series(1,1,1) gid;INSERT 0 1alvindb=> BEGIN;BEGINalvindb=> INSERT INTO tb_test_vacuum(test_num) SELECT gid FROM generate_series(2,3,1) gid;INSERT 0 2alvindb=> ROLLBACK;ROLLBACKalvindb=> SELECT * FROM tb_test_vacuum ORDER BY 1 DESC LIMIT 5; test_id | test_num ---------+---------- 8 | 1(1 row)

在第二个窗口查看表底层数据:

psql -d alvindb -U postgresalvindb=# SELECT * FROM heap_page_items(get_raw_page('alvin.tb_test_vacuum', 0)) LIMIT 10;

可以看到,在事务中,PostgreSQL 中 ROLLBACK 时并未删除已 INSERT 的数据。

进一步测试 ROLLBACK UPDATE。

alvindb=> TRUNCATE tb_test_vacuum;TRUNCATE TABLEalvindb=> INSERT INTO tb_test_vacuum(test_num) SELECT gid FROM generate_series(1,1,1) gid;INSERT 0 1alvindb=> BEGIN;BEGINalvindb=> SELECT * FROM tb_test_vacuum ORDER BY 1 DESC LIMIT 5; test_id | test_num ---------+---------- 12 | 1(1 row)alvindb=> UPDATE tb_test_vacuum SET test_num = test_num + 1 WHERE test_id = 12; UPDATE 1alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-14 18:25:11.651518+08(1 row)

此时在第二个窗口查看表底层数据:

接下来在第一个窗口 ROLLBACK:

alvindb=> ROLLBACK;ROLLBACKalvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-14 18:25:35.948455+08(1 row)alvindb=> SELECT * FROM tb_test_vacuum ORDER BY 1 DESC LIMIT 5; test_id | test_num ---------+---------- 12 | 1(1 row)

再在第二个窗口查看表底层数据:

如果 ROLLBACK ,PostgreSQL 不会在底层数据上做特殊处理,t_xmax 仍然为 0,但是由于事务本身是回滚的,其对外仍不可见。

相比 Oracle 中的 UPDATE 先将老版本中数据转移到 UNDO,ROLLBACK 再利用 UNDO 恢复原数据,PostgreSQL 中的 ROLLBACK 避免了两次不必要的 IO,既提高了性能,又节省了时间。

根据上面实验,可以看到 UPDATE/DELETE/ROLLBACK 都有可能造成 Bloat (膨胀)。如果频繁更新的表长时间未做 VACUUM,VACUUM 完之后仍会占用很大空间,Bloat (膨胀) 仍然存在。生产又不能随便做 VACUUM FULL 回收空间 。

那么如何有效减少 Bloat (膨胀)?

在计划内大量更新数据等情况,可以根据需要手动 VACUUM,这样回收的空间可供下次大量更新 (当然更准确的说是包括插入和更新在内的所有写入) 数据使用,这样可以有效减少 Bloat (膨胀)。

VACUUM 除了回收空间,还有其他作用吗?

截至 PostgreSQL 14,transaction id (事务 id) 是 32 位的,即最多有 2 的 32 次方,即 4294967296 个事务 id。中国人口按 14 亿算,一人也就能分配 3 个事务 id。所以 transaction id 范围是非常有限的,那么 PostgreSQL 是如何解决这个问题的呢?

从下图可以看出,PostgreSQL 是循环利用 transaction id 的,这样,transaction id 就无穷无尽的了。

以当前 transaction id 是 100 为例,大于 100 的约 21 亿 个事务对事务 100 不可见,小于 100 的约 21 亿 个事务对事务 100 可见。如果 transaction id 一直没有回收,直至 transaction id 耗尽(因为即使 transaction id 可以循环复用,但是实际分配的 transaction id 数量不会超过 2 的 31 次方,即一半),就会产生 wraparound (回卷) 问题,原来可见的突然变得不可见了,数据就“凭空消失”了。

那么 VACUUM 是如何回收 transaction id 的?是通过 FREEZE 对所有事务可见的数据。由于篇幅有限,且实际工作中基本不需要对 FREEZE 相关参数进行优化,FREEZE 将通过另外一篇文章单独讲述,本文不对 FREEZE 展开。

应用程序一般会有频繁的更新,不断造成 Bloat (膨胀) 及消耗 transaction id,总不能都手动 VACUUM 吧?

有没有自动的方式呢?当然!

四、AUTOVACUUM

1、AUTOVACUUM 简介

PostgreSQL 提供了 AUTOVACUUM 的机制。

autovacuum 不仅会自动进行 VACUUM,也会自动进行 ANALYZE,以获得最新统计信息用于生成更为准确的执行计划。

在 postgresql.conf 中,autovacuum 参数已默认打开。

autovacuum = on

autovacuum 打开后,会有一个 autovacuum launcher 进程

$ ps -ef|grep postgres|grep autovacuum|grep -v greppostgres 28398 28392 0 Nov13 ? 00:00:19 postgres: autovacuum launcher

 pg_stat_activity 也可以看到 backend_type 为 autovacuum launcher 的连接:

psql -d alvindb -U postgresalvindb=# \xExpanded display is on.alvindb=# SELECT * FROM pg_stat_activity WHERE backend_type = 'autovacuum launcher';-[ RECORD 1 ]----+------------------------------datid | datname | pid | 28398usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | 2021-11-13 23:18:00.406618+08xact_start | query_start | state_change | wait_event_type | Activitywait_event | AutoVacuumMainstate | backend_xid | backend_xmin | query | backend_type | autovacuum launcher

那么 AUTOVACUUM 多久运行一次?

autovacuum launcher 会每隔 autovacuum_naptime,创建 autovacuum worker,检查是否需要做 autovacuum。

psql -d alvindb -U postgresalvindb=# SELECT * FROM pg_stat_activity WHERE backend_type = 'autovacuum worker';-[ RECORD 1 ]----+------------------------------datid | 13220datname | postgrespid | 32457usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | 2021-11-06 23:32:53.880281+08xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type | autovacuum worker

autovacuum_naptime 默认为 1min:

#autovacuum_naptime = 1min # time between autovacuum runs

autovacuum 又是根据什么标准决定是否进行 VACUUM 和 ANALYZE 呢?

当 autovacuum worker 检查到,

dead tuples 大于 vacuum threshold 时,会自动进行 VACUUM。

vacuum threshold 公式如下:

vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples

增删改的行数大于 analyze threshold 时,会自动进行 ANALYZE。

analyze threshold 公式如下:

analyze threshold = analyze base threshold + analyze scale factor * number of tuples

对应 postgresql.conf 中相关参数如下:

#autovacuum_vacuum_threshold = 50 # min number of row updates before vacuum#autovacuum_analyze_threshold = 50 # min number of row updates before analyze#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum#autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze

dead tuples 为 pg_stat_user_tables.n_dead_tup(Estimated number of dead rows)

alvindb=> SELECT * FROM pg_stat_user_tables WHERE schemaname = 'alvin' AND relname = 'tb_test_vacuum';-[ RECORD 1 ]-------+---------------relid | 37409schemaname | alvinrelname | tb_test_vacuumseq_scan | 2seq_tup_read | 0idx_scan | 0idx_tup_fetch | 0n_tup_ins | 0n_tup_upd | 0n_tup_del | 0n_tup_hot_upd | 0n_live_tup | 0n_dead_tup | 0n_mod_since_analyze | 0last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | 0autovacuum_count | 0analyze_count | 0autoanalyze_count | 0

那么 number of tuples 是哪个列的值?是 pg_stat_user_tables.n_dead_tup(Estimate number of live rows)?还是实际的 count 值?

其实是 pg_class.reltuples (Estimate number of live rows in the table used by the planner)。
alvindb=> SELECT u.schemaname,u.relname,c.reltuples,u.n_live_tup,u.n_mod_since_analyze,u.n_dead_tup,u.last_autoanalyze,u.last_autovacuumFROM pg_stat_user_tables u, pg_class c, pg_namespace nWHERE n.oid = c.relnamespace AND c.relname = u.relname AND n.nspname = u.schemaname AND u.schemaname = 'alvin' AND u.relname = 'tb_test_vacuum'-[ RECORD 1 ]-------+---------------schemaname | alvinrelname | tb_test_vacuumreltuples | 0n_live_tup | 0n_mod_since_analyze | 0n_dead_tup | 0last_autoanalyze | last_autovacuum |

所以 AUTO VACUUM 具体公式如下:

pg_stat_user_tables.n_dead_tup > autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * pg_class.reltuples

同理,AUTO ANALYZE 具体公式如下:

pg_stat_user_tables.n_mod_since_analyze > autovacuum_analyze_threshold + autovacuum_analyze_scale_factor * pg_class.reltuples

2、精准触发 AUTOVACUUM

下面实测一下 autovacuum。为了测试方便,autovacuum_naptime 临时修改为 5s,这样触发了临界条件,只需要等 5s 就能看到效果,而不是等 1min。

修改参数如下:

autovacuum_naptime = 5sautovacuum_vacuum_threshold = 100 # min number of row updates before vacuumautovacuum_analyze_threshold = 100 # min number of row updates before analyzeautovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuumautovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze
接下来通过一步一步测试,精准触发 autovacuum。

为了方便测试,通过如下 AUTOVACUUM 计算 SQL 计算需要删除或修改的数据行数。

alvindb=> WITH v AS ( SELECT * FROM (SELECT setting AS autovacuum_vacuum_scale_factor FROM pg_settings WHERE name = 'autovacuum_vacuum_scale_factor') vsf, (SELECT setting AS autovacuum_vacuum_threshold FROM pg_settings WHERE name = 'autovacuum_vacuum_threshold') vth, (SELECT setting AS autovacuum_analyze_scale_factor FROM pg_settings WHERE name = 'autovacuum_analyze_scale_factor') asf, (SELECT setting AS autovacuum_analyze_threshold FROM pg_settings WHERE name = 'autovacuum_analyze_threshold') ath),t AS ( SELECT c.reltuples,u.* FROM pg_stat_user_tables u, pg_class c, pg_namespace n WHERE n.oid = c.relnamespace AND c.relname = u.relname AND n.nspname = u.schemaname AND u.schemaname = 'alvin' AND u.relname = 'tb_test_vacuum')SELECT schemaname, relname, autovacuum_vacuum_scale_factor, autovacuum_vacuum_threshold, autovacuum_analyze_scale_factor, autovacuum_analyze_threshold, n_live_tup, reltuples, autovacuum_analyze_trigger, n_mod_since_analyze, autovacuum_analyze_trigger - n_mod_since_analyze AS rows_to_mod_before_auto_analyze, last_autoanalyze, autovacuum_vacuum_trigger, n_dead_tup, autovacuum_vacuum_trigger - n_dead_tup AS rows_to_delete_before_auto_vacuum, last_autovacuumFROM ( SELECT schemaname, relname, autovacuum_vacuum_scale_factor, autovacuum_vacuum_threshold, autovacuum_analyze_scale_factor, autovacuum_analyze_threshold, floor(autovacuum_analyze_scale_factor::numeric * reltuples) + 1 + autovacuum_analyze_threshold::int AS autovacuum_analyze_trigger, floor(autovacuum_vacuum_scale_factor::numeric * reltuples) + 1 + autovacuum_vacuum_threshold::int AS autovacuum_vacuum_trigger, reltuples, n_live_tup, n_dead_tup, n_mod_since_analyze, last_autoanalyze, last_autovacuum FROM v, t) a;-[ RECORD 1 ]---------------------+---------------schemaname | alvinrelname | tb_test_vacuumautovacuum_vacuum_scale_factor | 0.2autovacuum_vacuum_threshold | 100autovacuum_analyze_scale_factor | 0.1autovacuum_analyze_threshold | 100n_live_tup | 0reltuples | 0autovacuum_analyze_trigger | 101n_mod_since_analyze | 0rows_to_mod_before_auto_analyze | 101last_autoanalyze | autovacuum_vacuum_trigger | 101n_dead_tup | 0rows_to_delete_before_auto_vacuum | 101last_autovacuum |
根据计算公式,
pg_stat_user_tables.n_mod_since_analyze > 100 + 0.1 * 0
即当修改的行数大于 100,即为 101 时,将触发 AUTO ANALYZE。

先插入 100 行数据,

alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 20:45:57.669183+08(1 row)alvindb=> INSERT INTO tb_test_vacuum(test_num) SELECT gid FROM generate_series(1,100,1) gid;INSERT 0 100
此时,通过如下计算可以看到,再更新 1 行,将触发 AUTO ANALYZE。
schemaname | alvinrelname | tb_test_vacuumautovacuum_vacuum_scale_factor | 0.2autovacuum_vacuum_threshold | 100autovacuum_analyze_scale_factor | 0.1autovacuum_analyze_threshold | 100n_live_tup | 100reltuples | 0autovacuum_analyze_trigger | 101n_mod_since_analyze | 100rows_to_mod_before_auto_analyze | 1last_autoanalyze | autovacuum_vacuum_trigger | 101n_dead_tup | 0rows_to_delete_before_auto_vacuum | 101last_autovacuum |
此时,统计信息为空:
alvindb=> SELECT * FROM pg_stats WHERE schemaname = 'alvin' AND tablename = 'tb_test_vacuum';(0 rows)

现在插入最后一条数据,

alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 20:46:31.034422+08(1 row)alvindb=> INSERT INTO tb_test_vacuum(test_num) SELECT gid FROM generate_series(101,101,1) gid;INSERT 0 1
执行 AUTOVACUUM 计算 SQL, 可以看到,已触发 AUTO ANALYZE:
schemaname | alvinrelname | tb_test_vacuumautovacuum_vacuum_scale_factor | 0.2autovacuum_vacuum_threshold | 100autovacuum_analyze_scale_factor | 0.1autovacuum_analyze_threshold | 100n_live_tup | 101reltuples | 101autovacuum_analyze_trigger | 111n_mod_since_analyze | 0rows_to_mod_before_auto_analyze | 111last_autoanalyze | 2021-11-06 20:46:39.88796+08autovacuum_vacuum_trigger | 121n_dead_tup | 0rows_to_delete_before_auto_vacuum | 121last_autovacuum |

可以看到表 tb_test_vacuum 统计信息已更新:

alvindb=> SELECT * FROM pg_stats WHERE schemaname = 'alvin' AND tablename = 'tb_test_vacuum';

查看 PostgreSQL 日志,可以看到

[ 2021-11-06 20:46:39.887 CST 6816 6186792f.1aa0 1 3/173948 13179359]LOG: automatic analyze of table "alvindb.alvin.tb_test_vacuum" system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

PostgreSQL 日志中是否记录 AUTOVACUUM 由参数 log_autovacuum_min_duration 控制,默认关闭。

#log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and # their durations, > 0 logs only # actions running at least this number # of milliseconds.

可将该参数改为 0,即记录所有的 AUTOVACUUM 操作。

log_autovacuum_min_duration = 0

 AUTOVACUUM 计算 SQL 的执行结果得知,再修改 111 行将触发 AUTO ANALYZE。

rows_to_mod_before_auto_analyze | 111rows_to_delete_before_auto_vacuum | 121

先修改 110 行,并 sleep 6s。

alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------ 2021-11-06 20:47:30.75553+08(1 row)alvindb=> INSERT INTO tb_test_vacuum(test_num) SELECT gid FROM generate_series(102,111,1) gid;INSERT 0 10alvindb=> UPDATE tb_test_vacuum SET test_num = test_num WHERE test_num <= 100;UPDATE 100alvindb=> SELECT pg_sleep(6); pg_sleep ----------
(1 row)alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 20:47:43.465651+08(1 row)
从 AUTOVACUUM 计算 SQL 的执行结果得知,修改后 110 行并 sleep 6s (前面已将 autovacuum_naptime 设置成了 5s)后,AUTO ANALYZE 并未触发。
schemaname | alvinrelname | tb_test_vacuumautovacuum_vacuum_scale_factor | 0.2autovacuum_vacuum_threshold | 100autovacuum_analyze_scale_factor | 0.1autovacuum_analyze_threshold | 100n_live_tup | 111reltuples | 101autovacuum_analyze_trigger | 111n_mod_since_analyze | 110rows_to_mod_before_auto_analyze | 1last_autoanalyze | 2021-11-06 20:46:39.88796+08autovacuum_vacuum_trigger | 121n_dead_tup | 100rows_to_delete_before_auto_vacuum | 21last_autovacuum |

再修改 1 行预计将触发 AUTO ANALYZE。此时删除一行:

alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 20:47:55.746411+08(1 row)alvindb=> DELETE FROM tb_test_vacuum WHERE test_id = 111;DELETE 1alvindb=> SELECT pg_sleep(6); pg_sleep ----------
(1 row)alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 20:48:01.796389+08(1 row)

从 AUTOVACUUM 计算 SQL 的查询结果中的 last_autoanalyze 得知,已精准触发 AUTO ANALYZE。

并且从 rows_to_delete_before_auto_vacuum 得知,预计删除 22 行后,将触发 AUTO VACUUM。

schemaname | alvinrelname | tb_test_vacuumautovacuum_vacuum_scale_factor | 0.2autovacuum_vacuum_threshold | 100autovacuum_analyze_scale_factor | 0.1autovacuum_analyze_threshold | 100n_live_tup | 110reltuples | 110autovacuum_analyze_trigger | 112n_mod_since_analyze | 0rows_to_mod_before_auto_analyze | 112last_autoanalyze | 2021-11-06 20:48:04.928899+08autovacuum_vacuum_trigger | 123n_dead_tup | 101rows_to_delete_before_auto_vacuum | 22last_autovacuum |

先删除 (UPDATE = DELETE + INSERT)  21 行:

alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 20:48:32.313706+08(1 row)
alvindb=> UPDATE tb_test_vacuum SET test_num = test_num WHERE test_num <= 21;UPDATE 21alvindb=> SELECT pg_sleep(6); pg_sleep ----------
(1 row)alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 20:48:38.454997+08(1 row)
从 AUTOVACUUM 计算 SQL 的查询结果中的 last_autovacuum 得知,还未触发 AUTO VACUUM。
并且从 rows_to_delete_before_auto_vacuum 得知,预计删除 1 行后,将触发 AUTO VACUUM。
schemaname | alvinrelname | tb_test_vacuumautovacuum_vacuum_scale_factor | 0.2autovacuum_vacuum_threshold | 100autovacuum_analyze_scale_factor | 0.1autovacuum_analyze_threshold | 100n_live_tup | 110reltuples | 110autovacuum_analyze_trigger | 112n_mod_since_analyze | 21rows_to_mod_before_auto_analyze | 91last_autoanalyze | 2021-11-06 20:48:04.928899+08autovacuum_vacuum_trigger | 123n_dead_tup | 122rows_to_delete_before_auto_vacuum | 1last_autovacuum |

此时删除一行

alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 20:48:39.174009+08(1 row)
alvindb=> DELETE FROM tb_test_vacuum WHERE test_id = 110;DELETE 1alvindb=> SELECT pg_sleep(6); pg_sleep ----------
(1 row)alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 20:48:45.213537+08(1 row)

从 AUTOVACUUM 计算 SQL 的查询结果中的 last_autovacuum 得知,已精准触发 AUTO VACUUM!

schemaname | alvinrelname | tb_test_vacuumautovacuum_vacuum_scale_factor | 0.2autovacuum_vacuum_threshold | 100autovacuum_analyze_scale_factor | 0.1autovacuum_analyze_threshold | 100n_live_tup | 109reltuples | 109autovacuum_analyze_trigger | 111n_mod_since_analyze | 22rows_to_mod_before_auto_analyze | 89last_autoanalyze | 2021-11-06 20:48:04.928899+08autovacuum_vacuum_trigger | 122n_dead_tup | 0rows_to_delete_before_auto_vacuum | 122last_autovacuum | 2021-11-06 20:48:49.914345+08

查看 PostgreSQL 日志,可以看到

[ 2021-11-06 20:48:49.914 CST 7207 618679b1.1c27 1 3/174162 0]LOG: automatic vacuum of table "alvindb.alvin.tb_test_vacuum": index scans: 1 pages: 0 removed, 1 remain, 0 skipped due to pins, 0 skipped frozen tuples: 123 removed, 109 remain, 0 are dead but not yet removable, oldest xmin: 13179371 buffer usage: 59 hits, 4 misses, 4 dirtied avg read rate: 121.832 MB/s, avg write rate: 121.832 MB/s system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s buffer usage: 59 hits, 4 misses, 4 dirtied avg read rate: 121.832 MB/s, avg write rate: 121.832 MB/s system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

那么问题来了,autovacuum_vacuum_scale_factor 为 0.2 对于所有的表都合适吗?1 亿数据量的表有 2000 万 dead tuples 以上才会触发 AUTO VACUUM,这意味着表越大越不容易触发 AUTO VACUUM。怎么可以解决这个问题呢?

3、精准触发表级 AUTOVACUUM

可以根据需要,在表上设置合理的 autovacuum_vacuum_scale_factor 。对于大表,可以设置小点的 autovacuum_vacuum_scale_factor ,如 0.1。

下面带你一步一步设置并精确触发表级的 AUTO ANALYZE 和 AUTO VACUUM。

这次将采用大一点的数据量进行测试。考虑到手动创建表,插入数据等比较麻烦,接下来测试利用 PostgreSQL 自带的工具 pgbench。

使用 pgbench 创建 10 万行数据的测试表:

$ pgbench -i alvindbdropping old tables...creating tables...generating data...100000 of 100000 tuples (100%) done (elapsed 0.38 s, remaining 0.00 s)vacuuming...creating primary keys...done.

修改表级参数:

alvindb=> ALTER TABLE pgbench_accounts SET (autovacuum_vacuum_scale_factor = 0.1, autovacuum_vacuum_threshold = 2000);ALTER TABLEalvindb=> ALTER TABLE pgbench_accounts SET (autovacuum_analyze_scale_factor = 0.05, autovacuum_analyze_threshold = 2000);ALTER TABLE

按照之前 AUTOVACUUM 计算 SQL ,可知要修改 11001 行才会触发 AUTO ANALYZE,  要有约 21001 个 dead tuples 才会触发 AUTO VACUUM。

schemaname | publicrelname | pgbench_accountsautovacuum_vacuum_scale_factor | 0.2autovacuum_vacuum_threshold | 1000autovacuum_analyze_scale_factor | 0.1autovacuum_analyze_threshold | 1000n_live_tup | 100000reltuples | 100000autovacuum_analyze_trigger | 11001n_mod_since_analyze | 0rows_to_mod_before_auto_analyze | 11001last_autoanalyze | autovacuum_vacuum_trigger | 21001n_dead_tup | 0rows_to_delete_before_auto_vacuum | 21001last_autovacuum |

现在设置了表级的参数以后,从如下 表级 AUTOVACUUM 计算 SQL ,可知修改 7001 行就可以触发 AUTO  ANALYZE,有约 12001 个 dead tuples 就可以触发 AUTO   VACUUM。更重要的是,表级的 AUTOVACUUM 参数不会对其他表产生影响,只对已设置的表有效,也可以对不同大小的表设置不同的参数,还可以随时调整!

表级 AUTOVACUUM 计算 SQL

alvindb=> WITH v AS (SELECT (SELECT split_part(x, '=', 2) FROM unnest(c.reloptions) q (x) WHERE x ~ '^autovacuum_vacuum_scale_factor=' ) as autovacuum_vacuum_scale_factor, (SELECT split_part(x, '=', 2) FROM unnest(c.reloptions) q (x) WHERE x ~ '^autovacuum_vacuum_threshold=' ) as autovacuum_vacuum_threshold, (SELECT split_part(x, '=', 2) FROM unnest(c.reloptions) q (x) WHERE x ~ '^autovacuum_analyze_scale_factor=' ) as autovacuum_analyze_scale_factor, (SELECT split_part(x, '=', 2) FROM unnest(c.reloptions) q (x) WHERE x ~ '^autovacuum_analyze_threshold=' ) as autovacuum_analyze_thresholdFROM pg_class cLEFT JOIN pg_namespace n ON n.oid = c.relnamespaceWHERE n.nspname IN ('public')AND c.relname = 'pgbench_accounts'),t AS ( SELECT c.reltuples,u.* FROM pg_stat_user_tables u, pg_class c, pg_namespace n WHERE n.oid = c.relnamespace AND c.relname = u.relname AND n.nspname = u.schemaname AND u.schemaname = 'public' AND u.relname = 'pgbench_accounts')SELECT schemaname, relname, autovacuum_vacuum_scale_factor, autovacuum_vacuum_threshold, autovacuum_analyze_scale_factor, autovacuum_analyze_threshold, n_live_tup, reltuples, autovacuum_analyze_trigger, n_mod_since_analyze, autovacuum_analyze_trigger - n_mod_since_analyze AS rows_to_mod_before_analyze, last_autoanalyze, autovacuum_vacuum_trigger, n_dead_tup, autovacuum_vacuum_trigger - n_dead_tup AS rows_to_delete_before_vacuum, last_autovacuumFROM ( SELECT schemaname, relname, autovacuum_vacuum_scale_factor, autovacuum_vacuum_threshold, autovacuum_analyze_scale_factor, autovacuum_analyze_threshold, floor(autovacuum_analyze_scale_factor::numeric * reltuples) + 1 + autovacuum_analyze_threshold::int AS autovacuum_analyze_trigger, floor(autovacuum_vacuum_scale_factor::numeric * reltuples) + 1 + autovacuum_vacuum_threshold::int AS autovacuum_vacuum_trigger, reltuples, n_live_tup, n_dead_tup, n_mod_since_analyze, last_autoanalyze, last_autovacuum FROM v, t) a;-[ RECORD 1 ]-------------------+-----------------schemaname | publicrelname | pgbench_accountsautovacuum_vacuum_scale_factor | 0.1autovacuum_vacuum_threshold | 2000autovacuum_analyze_scale_factor | 0.05autovacuum_analyze_threshold | 2000n_live_tup | 100000reltuples | 100000autovacuum_analyze_trigger | 7001n_mod_since_analyze | 0rows_to_mod_before_analyze | 7001last_autoanalyze | autovacuum_vacuum_trigger | 12001n_dead_tup | 0rows_to_delete_before_vacuum | 12001last_autovacuum |

现在已预测到要修改的行数,接下来一步一步来触发一下表级的 AUTO ANALYZE 和 AUTO  VACUUM。

先删除 7000 行数据:

alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 23:33:03.252622+08(1 row)alvindb=> DELETE FROM pgbench_accounts WHERE aid<=7000;DELETE 7000alvindb=> SELECT pg_sleep(6); pg_sleep ----------
(1 row)alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 23:33:09.363536+08(1 row)

根据表级 AUTOVACUUM 计算 SQL 执行结果的 rows_to_mod_before_analyze 得知,再修改 1 行将触发 AUTO ANALYZE:

schemaname | publicrelname | pgbench_accountsautovacuum_vacuum_scale_factor | 0.1autovacuum_vacuum_threshold | 2000autovacuum_analyze_scale_factor | 0.05autovacuum_analyze_threshold | 2000n_live_tup | 93000reltuples | 100000autovacuum_analyze_trigger | 7001n_mod_since_analyze | 7000rows_to_mod_before_analyze | 1last_autoanalyze | autovacuum_vacuum_trigger | 12001n_dead_tup | 7000rows_to_delete_before_vacuum | 5001last_autovacuum |

再修改 1 行:

alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 23:33:30.649717+08(1 row)alvindb=> UPDATE pgbench_accounts SET bid = bid WHERE aid=7001;UPDATE 1alvindb=> SELECT pg_sleep(6); pg_sleep ----------
(1 row)alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 23:33:36.705928+08(1 row)

根据表级 AUTOVACUUM 计算 SQL 执行结果的 last_autoanalyze 得知,已精准触发 AUTO ANALYZE!

schemaname | publicrelname | pgbench_accountsautovacuum_vacuum_scale_factor | 0.1autovacuum_vacuum_threshold | 2000autovacuum_analyze_scale_factor | 0.05autovacuum_analyze_threshold | 2000n_live_tup | 93000reltuples | 93000autovacuum_analyze_trigger | 6651n_mod_since_analyze | 0rows_to_mod_before_analyze | 6651last_autoanalyze | 2021-11-06 23:33:40.87317+08autovacuum_vacuum_trigger | 11301n_dead_tup | 7001rows_to_delete_before_vacuum | 4300last_autovacuum |

从 PostgreSQL 日志中也可以看到 AUTO  ANALYZE 被触发了:

[ 2021-11-06 23:33:40.873 CST 32646 6186a054.7f86 1 6/1393 13179750]LOG: automatic analyze of table "alvindb.public.pgbench_accounts" system usage: CPU: user: 0.04 s, system: 0.03 s, elapsed: 0.11 s

并且,根据 rows_to_delete_before_vacuum 得知,再删除 4300 行就可以触发 AUTO  VACUUM。

接下来先删除 4299 行,以测试临界值:

alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 23:33:43.867176+08(1 row)alvindb=> UPDATE pgbench_accounts SET bid = bid WHERE aid>=95702;UPDATE 4299alvindb=> SELECT pg_sleep(6); pg_sleep ----------
(1 row)alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 23:33:50.016447+08(1 row)

autovacuum_naptime 为 5s,此时并未触发 AUTO  VACUUM。

schemaname | publicrelname | pgbench_accountsautovacuum_vacuum_scale_factor | 0.1autovacuum_vacuum_threshold | 2000autovacuum_analyze_scale_factor | 0.05autovacuum_analyze_threshold | 2000n_live_tup | 93000reltuples | 93000autovacuum_analyze_trigger | 6651n_mod_since_analyze | 4299rows_to_mod_before_analyze | 2352last_autoanalyze | 2021-11-06 23:33:40.87317+08autovacuum_vacuum_trigger | 11301n_dead_tup | 11300rows_to_delete_before_vacuum | 1last_autovacuum |

再删除 (UPDATE = DELETE + INSERT)  1 行 : 

alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 23:33:53.326483+08(1 row)alvindb=> UPDATE pgbench_accounts SET bid = bid WHERE aid=7002;UPDATE 1alvindb=> SELECT pg_sleep(6); pg_sleep ----------
(1 row)alvindb=> SELECT clock_timestamp(); clock_timestamp ------------------------------- 2021-11-06 23:33:59.439375+08(1 row)

从如下结果中的last_autovacuum 得知,此时已精确触发 AUTO VACUUM!

schemaname | publicrelname | pgbench_accountsautovacuum_vacuum_scale_factor | 0.1autovacuum_vacuum_threshold | 2000autovacuum_analyze_scale_factor | 0.05autovacuum_analyze_threshold | 2000n_live_tup | 93000reltuples | 93000autovacuum_analyze_trigger | 6651n_mod_since_analyze | 4300rows_to_mod_before_analyze | 2351last_autoanalyze | 2021-11-06 23:33:40.87317+08autovacuum_vacuum_trigger | 11301n_dead_tup | 0rows_to_delete_before_vacuum | 11301last_autovacuum | 2021-11-06 23:34:00.956936+08

从 PostgreSQL 日志中也可以看到 AUTO  VACUUM 被触发了:

[ 2021-11-06 23:34:00.956 CST 32710 6186a068.7fc6 1 6/1455 0]LOG: automatic vacuum of table "alvindb.public.pgbench_accounts": index scans: 1 pages: 0 removed, 421 remain, 0 skipped due to pins, 0 skipped frozen tuples: 2 removed, 93000 remain, 0 are dead but not yet removable, oldest xmin: 13179755 buffer usage: 967 hits, 60 misses, 7 dirtied avg read rate: 10.067 MB/s, avg write rate: 1.174 MB/s system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.18 s

五、VACUUM 相关参数

对 VACUUM 有了一定的了解之后,下面系统介绍下 VACUUM 相关参数。

VACUUM 相关参数主要分为三大类。

1、第一类 与资源相关参数

#-----------------------------# RESOURCE USAGE (except WAL)#-----------------------------# - Memory -#maintenance_work_mem = 64MB # min 1MB#autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem# - Cost-Based Vacuum Delay -#vacuum_cost_delay = 0 # 0-100 milliseconds (0 disables)#vacuum_cost_page_hit = 1 # 0-10000 credits#vacuum_cost_page_miss = 10 # 0-10000 credits#vacuum_cost_page_dirty = 20 # 0-10000 credits#vacuum_cost_limit = 200 # 1-10000 credits

这里有两部分。

第一部分是内存相关。主要是 autovacuum_work_mem,默认值为 -1,即同 maintenance_work_memmaintenance_work_mem 默认值为 64MB。

第二部分是 Cost-Based Vacuum Delay。

当 VACUUM 工作超出一定量之后,会 sleep 一段时间。

一定量是多少呢?是 vacuum_cost_limit。默认值为 200。

sleep 多长时间呢?是 vacuum_cost_delay 。默认值是 0,即不 sleep。

工作量又是怎么算出来的?根据要 VACUUM 的 page 的不同,其 cost 是不一样的。

以下是三种不同 page 的 cost,默认值分别为 1、10、20,基本不用调整。

vacuum_cost_page_hit - The estimated cost for vacuuming a buffer found in the shared buffer cache.

vacuum_cost_page_miss - The estimated cost for vacuuming a buffer that has to be read from disk.

vacuum_cost_page_dirty - The estimated cost charged when vacuum modifies a block that was previously clean.

日常工作中手动 VACUUM 时主要调整 vacuum_cost_limit 和 vacuum_cost_delay 。如调整为:
vacuum_cost_delay = 2vacuum_cost_limit = 2000

即当 VACUUM 工作量超出 2000 之后,sleep 2ms。

需要注意,手动 VACUUM 和 AUTOVACUUM 的参数是不一样的。当 AUTOVACUUM 参数为  -1 时,则同手动 VACUUM 参数。

手动 VACUUM 对应的参数 maintenance_work_memvacuum_cost_delay  vacuum_cost_limit

AUTOVACUUM 对应的参数是 autovacuum_work_memautovacuum_vacuum_cost_delayautovacuum_vacuum_cost_limit

可以从下面 AUTOVACUUM 参数中可以看到autovacuum_vacuum_cost_delay 默认值为 20ms,这样的话,AUTOVACUUM 运行时其对数据库影响较小。PostgreSQL 12 开始,其默认值调整为了 2ms。

#autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for # autovacuum, in milliseconds; # -1 means use vacuum_cost_delay#autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # autovacuum, -1 means use # vacuum_cost_limit

vacuum_cost_delay 的设置有一点需要注意:在 PostgreSQL 11 及之前的版本,文档中的原文是:Note that on many systems, the effective resolution of sleep delays is 10 milliseconds; setting vacuum_cost_delay to a value that is not a multiple of 10 might have the same results as setting it to the next higher multiple of 10.

2、第二类 AUTOVACUUM 相关参数

#------------------------------------------------------------------------------# AUTOVACUUM#------------------------------------------------------------------------------
#autovacuum = on # Enable autovacuum subprocess? 'on' # requires track_counts to also be on.#log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and # their durations, > 0 logs only # actions running at least this number # of milliseconds.#autovacuum_max_workers = 3 # max number of autovacuum subprocesses # (change requires restart)#autovacuum_naptime = 1min # time between autovacuum runs#autovacuum_vacuum_threshold = 50 # min number of row updates before # vacuum#autovacuum_analyze_threshold = 50 # min number of row updates before # analyze#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum#autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum # (change requires restart)#autovacuum_multixact_freeze_max_age = 400000000 # maximum multixact age # before forced vacuum # (change requires restart)#autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for # autovacuum, in milliseconds; # -1 means use vacuum_cost_delay#autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # autovacuum, -1 means use # vacuum_cost_limit

以下参数前面已有提到,一般调整为通用配置后基本不调整,调整的话多是调整表级的参数,即根据不同的表设置不同的参数。

#autovacuum = on#log_autovacuum_min_duration = -1#autovacuum_naptime = 1min#autovacuum_vacuum_threshold = 50#autovacuum_analyze_threshold = 50#autovacuum_vacuum_scale_factor = 0.2#autovacuum_analyze_scale_factor = 0.1#autovacuum_vacuum_cost_delay = 20ms#autovacuum_vacuum_cost_limit = -1

下面两个参数是当某个表的 age 达到一定阈值后,AUTOVACUUM 会对整个数据库实例进行 aggressive vacuum 以避免 wraparound,即使表没有 dead tuple。数据库运行良好的话,很少会触发。

#autovacuum_freeze_max_age = 200000000#autovacuum_multixact_freeze_max_age = 400000000

当数据库中表比较多,甚至一个实例中数据库也比较多的情况,可适当增大 autovacuum_max_workers

#autovacuum_max_workers = 3 # max number of autovacuum subprocesses # (change requires restart)

问题来了,增大 autovacuum_max_workers 后,一定会提高 AUTOVACUUM 速度吗?

里需要注意,autovacuum_vacuum_cost_limit 是所有 autovacuum worker 所用 cost 之和达到 limit 之后 sleep,增大autovacuum_max_workers 之后,每个 worker 平均的 cost limit 就小了,即就相对更容易达到 limit,这样做同样的工作,就会 sleep 更多的时间,反而就更慢了。

所以,在增大 autovacuum_max_workers 之后,可以相应比例增大autovacuum_vacuum_cost_limit

3、第三类 FREEZE 相关参数

以下是 FREEZE 相关参数,以后将系统介绍 FREEZE,本文不再展开讨论。

#------------------------------------------# CLIENT CONNECTION DEFAULTS#------------------------------------------#vacuum_freeze_min_age = 50000000#vacuum_freeze_table_age = 150000000#vacuum_multixact_freeze_min_age = 5000000#vacuum_multixact_freeze_table_age = 150000000

六、VACUUM 参数优化

上面已经介绍过了以下设置表级 AUTOVACUUM 相关参数和 autovacuum_max_workers

ALTER TABLE pgbench_accounts SET (autovacuum_vacuum_scale_factor = 0.1, autovacuum_vacuum_threshold = 2000);ALTER TABLE pgbench_accounts SET (autovacuum_analyze_scale_factor = 0.05, autovacuum_analyze_threshold = 2000);

下面就以下常用 VACUUM 参数详细介绍如何进行调优。

maintenance_work_mem 参数
#maintenance_work_mem = 64MB # min 1MB#autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem

vacuum_cost_delay 参数

#vacuum_cost_delay = 0#autovacuum_vacuum_cost_delay = 20ms

vacuum_cost_limit 参数

#vacuum_cost_limit = 200#autovacuum_vacuum_cost_limit = -1

1、参数优化测试用例

使用 pgbench 生成 5000 万测试数据。

pgbench -i -s 500 alvindb

设置表级 AUTOVACUUM 相关参数:

ALTER TABLE pgbench_accounts SET (autovacuum_vacuum_scale_factor = 0.1, autovacuum_vacuum_threshold = 2000);ALTER TABLE pgbench_accounts SET (autovacuum_analyze_scale_factor = 0.05, autovacuum_analyze_threshold = 2000);

删除 2000 万数据:

DELETE FROM pgbench_accounts WHERE aid>=1500001 AND aid <=3500000;DELETE FROM pgbench_accounts WHERE aid>=15000001 AND aid <=17000000;DELETE FROM pgbench_accounts WHERE aid>=25000001 AND aid <=28000000;DELETE FROM pgbench_accounts WHERE aid>=35000001 AND aid <=38000000;DELETE FROM pgbench_accounts WHERE aid>=40000001;

另外,将 autovacuum_naptime  设置为 3min 以给删除数据留够时间。

alvindb=> SHOW autovacuum_naptime; autovacuum_naptime -------------------- 3min(1 row)

2、AUTOVACUUM 测试

首先采用默认参数:

SHOW maintenance_work_mem; maintenance_work_mem ---------------------- 64MB(1 row)SHOW autovacuum_work_mem; autovacuum_work_mem --------------------- -1(1 row)SHOW autovacuum_vacuum_cost_delay; autovacuum_vacuum_cost_delay ------------------------------ 20ms(1 row)SHOW autovacuum_vacuum_cost_limit; autovacuum_vacuum_cost_limit ------------------------------ -1(1 row)SHOW vacuum_cost_limit; vacuum_cost_limit ------------------- 200(1 row)

执行测试用例的同时,通过如下  SQL 查询

SELECT * FROM pg_stat_activity WHERE backend_type ~ 'autovacuum worker' AND pid <> pg_backend_pid();\watch 1

可以看到, AUTOVACUUM 已触发,并且 autovacuum worker 已启动,先是 query 为空,而后 query 为 VACUUM ANALYZE public.pgbench_accounts。

2021年11月07日 星期日 23时17分09秒 (every 1s)
-[ RECORD 1 ]----+------------------------------datid | 37509datname | alvindbpid | 16660usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | 2021-11-07 23:17:09.427627+08xact_start | 2021-11-07 23:17:09.426378+08query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | 13180269query | backend_type | autovacuum worker
2021年11月07日 星期日 23时17分10秒 (every 1s)
-[ RECORD 1 ]----+---------------------------------------------------datid | 37509datname | alvindbpid | 16660usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | 2021-11-07 23:17:09.427627+08xact_start | 2021-11-07 23:17:09.459083+08query_start | 2021-11-07 23:17:09.459083+08state_change | 2021-11-07 23:17:09.459084+08wait_event_type | wait_event | state | activebackend_xid | backend_xmin | 13180269query | autovacuum: VACUUM ANALYZE public.pgbench_accountsbackend_type | autovacuum worker

成后,根据 last_autoanalyze 和 last_autovacuum 得知是先做了 VACUUM,然后做 ANALYZE:

schemaname | publicrelname | pgbench_accountsautovacuum_vacuum_scale_factor | 0.1autovacuum_vacuum_threshold | 2000autovacuum_analyze_scale_factor | 0.05autovacuum_analyze_threshold | 2000n_live_tup | 30000000reltuples | 30000000autovacuum_analyze_trigger | 1502001n_mod_since_analyze | 0rows_to_mod_before_analyze | 1502001last_autoanalyze | 2021-11-07 23:22:41.640812+08autovacuum_vacuum_trigger | 3002001n_dead_tup | 142848rows_to_delete_before_vacuum | 2859153last_autovacuum | 2021-11-07 23:22:14.06792+08

查看 PostgreSQL 日志,得知 VACUUM 用时 304.60 s,ANALYZE 用时 27.56 s。

[ 2021-11-07 23:22:14.067 CST 16660 6187edf5.4114 1 3/238595 13180270]LOG: automatic vacuum of table "alvindb.public.pgbench_accounts": index scans: 2 pages: 35699 removed, 165914 remain, 576 skipped due to pins, 0 skipped frozen tuples: 6190880 removed, 30142834 remain, 0 are dead but not yet removable, oldest xmin: 13180269 buffer usage: 419596 hits, 40795 misses, 100492 dirtied avg read rate: 4.185 MB/s, avg write rate: 10.310 MB/s system usage: CPU: user: 14.02 s, system: 1.41 s, elapsed: 304.60 s
[ 2021-11-07 23:22:41.640 CST 16660 6187edf5.4114 2 3/238596 13180271]LOG: automatic analyze of table "alvindb.public.pgbench_accounts" system usage: CPU: user: 3.25 s, system: 7.58 s, elapsed: 27.56 s

通过调整参数,逐步并反复测试,结果如下:

maintenance_work_memautovacuum_vacuum_cost_delayvacuum_cost_limitAUTOVACUUM 用时
64MB20ms200304.60 s + 27.56 s
64MB2ms20039.45 s +8.73 s
64MB2ms200018.79 s + 5.50 s
64MB2ms20042.04 s + 8.00 s
64MB20ms200329.72 s + 22.82 s
64MB0ms200017.72 s + 3.45 s
512MB0ms200012.75 s + 3.35 s
64MB0ms200015.13 s + 5.45 s

根据如上试,可以验证,适当增大autovacuum_work_mem 和 autovacuum_vacuum_cost_limit、减少autovacuum_vacuum_cost_delay 可提高 AUTOVACUUM 性能。

3、手动 VACUUM ANALYZE 测试

下面测试手动 VACUUM ANALYZE。测试方法基本与 AUTOVACUUM 一样。

这里主要单测试下 maintenance_work_mem,其他不再重复测试。

闭 autovacuum 并将 vacuum_cost_delay 设置为 0,并手动执行如下 SQL:

VACUUM ANALYZE pgbench_accounts;

用时统计如下:

maintenance_work_memvacuum_cost_delayvacuum_cost_limitVACUUM 用时
64MB0ms200023.137 s
128MB0ms200018.284 s
64MB0ms200024.144 s

根据如上试,可以验证,适当增大maintenance_work_mem 和 vacuum_cost_limit、减少vacuum_cost_delay 可提高 VACUUM 性能。

与 AUTOVACUUM 不同的是,手动 VACUUM 可以通过如下更改 session 级别的方式设置参数。这样,在实际工作中,就可以灵活调整参数而不需要改配置文件了。

SET vacuum_cost_delay = 10;VACUUM ANALYZE pgbench_accounts;

注意:实际生产环境中,尤其是现在 SSD 等高端存储已被广泛应用于 DB Server,autovacuum_vacuum_cost/autovacuum_vacuum_cost_limit (次因) 调大且 vacuum_cost_delay/autovacuum_cost_delay (主因) 调小后,在 VACUUM 性能得到提升的同时,可能会对 DB 集群产生较大冲击,常见的副作用是产生较大的 IO/网络流量峰值或集群主从延迟。

七、AUTOVACUUM to prevent wraparound

autovacuum_freeze_max_age 是 AUTOVACUUM 最不常用的参数,也基本不需要优化,但却是 AUTOVACUUM 最重要的一个参数,因为它与 wraparound 有关,即使 AUTOVACUUM 关闭,达到这个阈值,也会强制触发 AUTOVACUUM ,可见它有多重要。

#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum # (change requires restart)

首先打开 AUTOVACUUM。为了方便测试,将 autovacuum_freeze_max_age 设置为其最小值。

autovacuum_freeze_max_age = 100000

正常情况是达不到 autovacuum_freeze_max_age 阈值的,因为还没有达到这个阈值,就触发了 AUTOVACUUM。为了能测试到这个阈值,人工创建一条 idle in transaction,以阻塞正常的 AUTOVACUUM。

alvindb=# BEGIN;BEGINalvindb=# SELECT txid_current(); txid_current-------------- 15200695(1 row)
alvindb=# UPDATE alvin.tb_test_vacuum SET test_num = test_num WHERE test_num = 1;UPDATE 1

查到 idle in transaction 如下,可以看到其 backend_xid 为 15200695:

postgres=# SELECT * FROM pg_stat_activity WHERE state = 'idle in transaction';-[ RECORD 1 ]----+------------------------------datid | 37509datname | alvindbpid | 28497usesysid | 10usename | postgresapplication_name | psqlclient_addr | client_hostname | client_port | -1backend_start | 2021-11-13 23:18:33.480814+08xact_start | 2021-11-13 23:19:45.288275+08query_start | 2021-11-13 23:19:58.133248+08state_change | 2021-11-13 23:19:58.133412+08wait_event_type | Clientwait_event | ClientReadstate | idle in transactionbackend_xid | 15200695backend_xmin | query | SELECT txid_current();backend_type | client backend

不断更新表,使其达到阈值。

alvindb=# SELECT c.relname, c.relfrozenxid,age(c.relfrozenxid) FROM pg_class c WHERE relname = 'pgbench_accounts'; relname | relfrozenxid | age ------------------+--------------+-------- pgbench_accounts | 15250696 | 100000(1 row)

此时修改最后一条数据,使表的 age 超过阈值。

alvindb=# SELECT c.relname, age(c.relfrozenxid) FROM pg_class c WHERE relname = 'pgbench_accounts'; relname | age ------------------+-------- pgbench_accounts | 100001(1 row)

可以看到,此时已触发 aggressive vacuum to prevent wraparound。

其 query 为 VACUUM public.pgbench_accounts (to prevent wraparound),并且 backend_xmin 为 15200695。

-[ RECORD 1 ]----+-------------------------------------------------------------------datid | 37509datname | alvindbpid | 30288usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | 2021-11-13 23:30:39.767723+08xact_start | 2021-11-13 23:30:39.788839+08query_start | 2021-11-13 23:30:39.788839+08state_change | 2021-11-13 23:30:39.788839+08wait_event_type | wait_event | state | activebackend_xid | backend_xmin | 15200695query | autovacuum: VACUUM public.pgbench_accounts (to prevent wraparound)backend_type | autovacuum worker

从 PostgreSQL 日志中可以看到触发的 automatic aggressive vacuum。

[ 2021-11-13 23:30:40323 CST 30288 618fda1f.7650 3 6/72694 0]LOG: automatic aggressive vacuum of table "alvindb.public.pgbench_accounts": index scans: 1 pages: 0 removed, 404 remain, 0 skipped due to pins, 0 skipped frozen tuples: 3 removed, 100000 remain, 0 are dead but not yet removable, oldest xmin: 15200695 buffer usage: 438 hits, 480 misses, 412 dirtied avg read rate: 28.065 MB/s, avg write rate: 24.089 MB/s system usage: CPU: user: 0.02 s, system: 0.03 s, elapsed: 0.53 s[ 2021-11-13 23:30:40.334 CST 30288 618fda1f.7650 4 6/72696 0]WARNING: oldest xmin is far in the past

所以,如果一旦发现 aggressive vacuum (to prevent wraparound),需要格外重视,检查是否出现类似 idle in transaction 等异常情况。

尤其是 PostgreSQL 9.6 之前,还没有参数 idle_in_transaction_session_timeout (since PostgreSQL 9.6) 的版本,需要手动 terminate idle in transaction。

这就需要 DBA 对 idle in transaction 的这种 slow query (包括增删查改的广义 query) 重点做监控,防患于未然。

当然也有可能是真的是写入过多过快,对 transaction id 的消耗太多,AUTOVACUUM 不及时,导致出现了 wraparound,同样也可以对如下系统表的相关运行状态进行监控来预防。
--表级SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as ageFROM pg_class c LEFT OUTER JOIN pg_class t ON c.reltoastrelid = t.oidWHERE c.relkind IN ('r', 'm')ORDER BY 2 desc ;
--DB级SELECT datname, age(datfrozenxid) FROM pg_database;

实际生产环境中,比较有效的做法是:夜间每天定时执行对实例中所有的 DB 的 VACUUM ANALYZE,避免出现过度膨胀,保证 DB 实例的健康运行。

当然要调优好 VACUUM 相关参数,避免对 DB 集群产生较大冲击,产生其他隐患而报警。同时也要避免和其他夜间的任务 (比如备份任务) 叠加执行。

八、总结

本文理论结合实践,力求深入浅出的阐述 VACCUM 的来龙去脉。同时系统的分析了 VACUUM 的相关参数,且通过细致的实际的测试演示了如何调优 (及注意事项)。最后针对 prevent wraparound 这一常见实际场景做了简要的分析。

希望此文对广大 PGer 能有一点点帮助,不足之处,望不吝指出

End

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存