备库是否有自己的统计信息？

Original xiongcc PostgreSQL学徒

2024-09-30

1前言

今天在 2 群里一位筒子提了这样一个问题

我们最近很多报表在读复制从节点的数据，但从节点的 pg_stat_all_tables 里面好像最后vacuum 和 analyze 的时间都没有，这个统计数据或者 vacuum 操作会在从节点也重放吗

首先可以肯定的是，vacuum 操作肯定会在备库回放，不然何来 snapshot conflict呢。所以 2 群某位小伙伴的如下观点是错误的

从库不会vacuum，只会同步主库的元组变更

问题很明了，备库的 pg_stat_all_tables 里面的 vacuum / analyze 相关字段没有数据，导致读取出来的数据没有意义。正好昨天把狂飙电视剧追完了，也需要重拾一下烂笔头了。

闲话少叙，让我们一起分析一下这个问题！

2复现

首先简单搭建个主从，然后主库上模拟几条操作

postgres=# create table test(id int);
CREATE TABLE
postgres=# insert into test values(generate_series(1,100000));
INSERT 0 100000
postgres=# delete from test where id < 20000;
DELETE 19999
postgres=# vacuum analyze test;
VACUUM
postgres=# select relname,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze from pg_stat_all_tables where relname = 'test';
-[ RECORD 1 ]----+------------------------------
relname          | test
last_vacuum      | 2023-02-02 20:04:17.648657+08
last_autovacuum  | 2023-02-02 20:03:33.133273+08
last_analyze     | 2023-02-02 20:04:17.686317+08
last_autoanalyze | 2023-02-02 20:03:33.197652+08

可以看到自动触发了 autovacuum，手动执行 vacuum 也有记录。那去备库上瞅瞅

postgres=# select relname,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze from pg_stat_all_tables where relname = 'test';
-[ RECORD 1 ]----+-----
relname          | test
last_vacuum      | 
last_autovacuum  | 
last_analyze     | 
last_autoanalyze | 

postgres=# select * from pg_stat_all_tables where relname = 'test';
-[ RECORD 1 ]-------+-------
relid               | 16961
schemaname          | public
relname             | test
seq_scan            | 0
seq_tup_read        | 0
idx_scan            | 
idx_tup_fetch       | 
n_tup_ins           | 0
n_tup_upd           | 0
n_tup_del           | 0
n_tup_hot_upd       | 0
n_live_tup          | 0
n_dead_tup          | 0
n_mod_since_analyze | 0
n_ins_since_vacuum  | 0
last_vacuum         | 
last_autovacuum     | 
last_analyze        | 
last_autoanalyze    | 
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 0
autoanalyze_count   | 0

postgres=# vacuum analyze test;
ERROR:  cannot execute VACUUM during recovery

果然，不仅 vacuum 相关的字段没有数据，其他字段也没有数据，而且备库也无法执行这类写操作。那为什么会产生的差异呢？

3分析

先从源头出发便可发现端倪，pg_stat_all_tables 原来仅仅是一个视图，just a view！并非表对象，是一个关于表上各个维度的计数器 👇🏻，这个统计的过程由 stats collector 进程来做。

SELECT c.oid AS relid,
    n.nspname AS schemaname,
    c.relname,
    pg_stat_get_numscans(c.oid) AS seq_scan,
    pg_stat_get_tuples_returned(c.oid) AS seq_tup_read,
    sum(pg_stat_get_numscans(i.indexrelid))::bigint AS idx_scan,
    sum(pg_stat_get_tuples_fetched(i.indexrelid))::bigint + pg_stat_get_tuples_fetched(c.oid) AS idx_tup_fetch,
    pg_stat_get_tuples_inserted(c.oid) AS n_tup_ins,
    pg_stat_get_tuples_updated(c.oid) AS n_tup_upd,
    pg_stat_get_tuples_deleted(c.oid) AS n_tup_del,
    pg_stat_get_tuples_hot_updated(c.oid) AS n_tup_hot_upd,
    pg_stat_get_live_tuples(c.oid) AS n_live_tup,
    pg_stat_get_dead_tuples(c.oid) AS n_dead_tup,
    pg_stat_get_mod_since_analyze(c.oid) AS n_mod_since_analyze,
    pg_stat_get_ins_since_vacuum(c.oid) AS n_ins_since_vacuum,
    pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum,
    pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum,
    pg_stat_get_last_analyze_time(c.oid) AS last_analyze,
    pg_stat_get_last_autoanalyze_time(c.oid) AS last_autoanalyze,
    pg_stat_get_vacuum_count(c.oid) AS vacuum_count,
    pg_stat_get_autovacuum_count(c.oid) AS autovacuum_count,
    pg_stat_get_analyze_count(c.oid) AS analyze_count,
    pg_stat_get_autoanalyze_count(c.oid) AS autoanalyze_count
   FROM pg_class c
     LEFT JOIN pg_index i ON c.oid = i.indrelid
     LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
  WHERE c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char", 'p'::"char"])
  GROUP BY c.oid, n.nspname, c.relname;

再回想一下我之前写过的关于统计信息收集的基础原理 👇🏻

每个进程执行 DML 会去调用 pgstat_report_stat 向 stats collector 进程反馈统计信息，后面会通过 pgstat_send_tabstat 发送表的统计信息，其他的诸如函数/WAL/SLRU的统计信息，则会有对应的三个函数 pgstat_send_funcstats/pgstat_send_wal/pgstat_send_slru 来完成。

每个后端进程在事务提交/回滚时会发消息给进程 stats collector，stats collector 会汇总这份信息并记录到文件中，然后 autovacuum launcher 定期读取文件以决定何时触发 autovacuum vacuum进行工作。而 pg_stat_all_tables 所涉及到的 pg_stat_get_tuples_* 这类相关函数正是与 stats collector 进程打交道的，但是统计信息所涉及到的相关文件（pg_stat_tmp / pg_stat 目录下）又不会被复制到备库。纳尼？不会复制到备库，为什么不会复制到备库？

卖个关子，各位思考五分钟...

回顾一下流复制的原理，流复制本质上就是基于 WAL record 的流式传输，主库 walsender 进程解析 WAL record 发送到备库，备库 startup 进程再进行回放，而 pg_stat_tmp 这类文件非普通对象，都不会记录 WAL record，那么自然不会复制到备库。

除此之外，其实还有个细节可以侧面证明，一般我们搭建流复制都是使用 pg_basebackup，而 pg_basebackup 在备份的过程中是会跳过一些文件的

/*
 * The contents of these directories are removed or recreated during server
 * start so they are not included in backups.  The directories themselves are
 * kept and included as empty to preserve access permissions.
 *
 * Note: this list should be kept in sync with the filter lists in pg_rewind's
 * filemap.c.
 */
static const char *const excludeDirContents[] =
{
 /*
  * Skip temporary statistics files. PG_STAT_TMP_DIR must be skipped even
  * when stats_temp_directory is set because PGSS_TEXT_FILE is always
  * created there.
  */
 PG_STAT_TMP_DIR,

 /*
  * It is generally not useful to backup the contents of this directory
  * even if the intention is to restore to another primary. See backup.sgml
  * for a more detailed description.
  */
 "pg_replslot",

 /* Contents removed on startup, see dsm_cleanup_for_mmap(). */
 PG_DYNSHMEM_DIR,

 /* Contents removed on startup, see AsyncShmemInit(). */
 "pg_notify",

 /*
  * Old contents are loaded for possible debugging but are not required for
  * normal operation, see SerialInit().
  */
 "pg_serial",

 /* Contents removed on startup, see DeleteAllExportedSnapshotFiles(). */
 "pg_snapshots",

 /* Contents zeroed on startup, see StartupSUBTRANS(). */
 "pg_subtrans",

 /* end of list */
 NULL
};

  /* Scan for directories whose contents should be excluded */
  excludeFound = false;
  for (excludeIdx = 0; excludeDirContents[excludeIdx] != NULL; excludeIdx++)
  {
   if (strcmp(de->d_name, excludeDirContents[excludeIdx]) == 0)
   {
    elog(DEBUG1, "contents of directory \"%s\" excluded from backup", de->d_name);
    size += _tarWriteDir(pathbuf, basepathlen, &statbuf, sizeonly);
    excludeFound = true;
    break;
   }
  }

可以看到 PG_STAT_TMP_DIR便被跳过了。所以此时备库没这个文件（注意我这里用了此时）

[postgres@xiongcc ~]$ ll 14data/pg_stat_tmp/
total 20
-rw------- 1 postgres postgres  2225 Feb  2 21:08 db_0.stat
-rw------- 1 postgres postgres 11290 Feb  2 21:08 db_13892.stat
-rw------- 1 postgres postgres  1335 Feb  2 21:08 global.stat
[postgres@xiongcc ~]$ ll 14data_bak/pg_stat_tmp/
total 0

让我们验证一下，这里为了演示，使用 14 版本，因为在 15 的版本以后移除了 stats collector 进程，pg_stat_tmp 目录继续保留主要为了一些插件，比如 pg_stat_statements。

postgres=# select pg_backend_pid();
 pg_backend_pid 
----------------
          19224
(1 row)

postgres=# select relname,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze from pg_stat_all_tables where relname = 'test';  ---查询一下
 relname |          last_vacuum          |       last_autovacuum        |         last_analyze          |       last_autoanalyze        
---------+-------------------------------+------------------------------+-------------------------------+-------------------------------
 test    | 2023-02-02 21:05:11.320255+08 | 2023-01-16 15:32:56.88557+08 | 2023-02-02 21:05:28.816788+08 | 2023-01-16 15:32:58.050305+08
(1 row)

另一个终端用 strace 抓一下

[postgres@xiongcc ~]$ strace -p 19224
strace: Process 19224 attached
epoll_wait(4, [{EPOLLIN, {u32=42797616, u64=42797616}}], 1, -1) = 1
recvfrom(10, "Q\0\0\0}select relname,last_vacuum,"..., 8192, 0, NULL, NULL) = 126
lseek(17, 0, SEEK_END)                  = 106496
lseek(18, 0, SEEK_END)                  = 32768
lseek(19, 0, SEEK_END)                  = 49152
lseek(20, 0, SEEK_END)                  = 40960
lseek(21, 0, SEEK_END)                  = 32768
lseek(22, 0, SEEK_END)                  = 16384
lseek(23, 0, SEEK_END)                  = 16384
lseek(24, 0, SEEK_END)                  = 8192
lseek(25, 0, SEEK_END)                  = 16384
lseek(26, 0, SEEK_END)                  = 16384
lseek(24, 0, SEEK_END)                  = 8192
open("pg_stat_tmp/global.stat", O_RDONLY) = 33
fstat(33, {st_mode=S_IFREG|0600, st_size=1335, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb56f356000
read(33, "\242\274\245\1\335|m\374\265\226\2\0o\r\0\0\0\0\0\0\3\0\0\0\0\0\0\0\244(\4\0"..., 4096) = 1335
close(33)                               = 0
munmap(0x7fb56f356000, 4096)            = 0
sendto(9, "\1\0\0\0 \0\0\0X>\234\375\265\226\2\08\235\224\375\265\226\2\0D6\0\0\265\226\2\0", 32, 0, NULL, 0) = 32
select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=10000}) = 0 (Timeout)
open("pg_stat_tmp/global.stat", O_RDONLY) = 33
fstat(33, {st_mode=S_IFREG|0600, st_size=1335, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb56f356000
read(33, "\242\274\245\1\344>\234\375\265\226\2\0o\r\0\0\0\0\0\0\3\0\0\0\0\0\0\0\244(\4\0"..., 4096) = 1335
close(33)                               = 0
munmap(0x7fb56f356000, 4096)            = 0
open("pg_stat_tmp/global.stat", O_RDONLY) = 33
fstat(33, {st_mode=S_IFREG|0600, st_size=1335, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb56f356000
read(33, "\242\274\245\1\344>\234\375\265\226\2\0o\r\0\0\0\0\0\0\3\0\0\0\0\0\0\0\244(\4\0"..., 4096) = 1335
open("pg_stat_tmp/db_13892.stat", O_RDONLY) = 41   ---👈🏻这里
fstat(41, {st_mode=S_IFREG|0600, st_size=11290, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb56f355000
read(41, "\242\274\245\1T[\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
read(41, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
read(41, "\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\2\0\0\0\0\0\0"..., 4096) = 3098
close(41)                               = 0
munmap(0x7fb56f355000, 4096)            = 0
open("pg_stat_tmp/db_0.stat", O_RDONLY) = 41
fstat(41, {st_mode=S_IFREG|0600, st_size=2225, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb56f355000
read(41, "\242\274\245\1T\275\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 2225
close(41)                               = 0
munmap(0x7fb56f355000, 4096)            = 0
close(33)                               = 0
munmap(0x7fb56f356000, 4096)            = 0
sendto(9, "\2\0\0\0\300\3\0\0D6\0\0\10\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 960, 0, NULL, 0) = 960
sendto(10, "T\0\0\0\242\0\5relname\0\0\0000C\0\3\0\0\0\23\0@\377\377\377\377\0"..., 329, 0, NULL, 0) = 329
recvfrom(10, 0x1066700, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

很清晰了，该进程去读了 pg_stat_tmp 目录下的 db_13892.stat。

不过值得注意的是，备库是可以自己做checkpoint的，目的是避免每次重启备库都需要从上一个检查点（由主库产生，在WAL中回放出来的）回放后面所有的WAL，但是注意并不会去更新控制文件相关的位点，因为是从主库同步过来的。所以，备库执行了checkpoint，会生成相关文件。不过还是那个问题，刷到存储上的是备库"此时"的状态，所以还是0。

[postgres@xiongcc ~]$ ll 14data_bak/pg_stat_tmp/
total 12
-rw------- 1 postgres postgres  745 Feb  2 21:46 db_0.stat
-rw------- 1 postgres postgres 2040 Feb  2 21:46 db_13892.stat
-rw------- 1 postgres postgres 1335 Feb  2 21:46 global.stat

4小结

通过这个小案例，其实我们也能发散一下思维，思考的更深一点：

autovacuum launcher 定期读取 stats collector 进程收集的统计信息文件以决定何时触发 autovacuum，而备库是不包含此类统计信息文件的，因此假如有个表在主库上即将达到了触发阈值就快要进行 vacuum/analyze 了，结果这个时候做了一个 switchover 或者 failover，新的主库由于没有统计数据，需要新的活动才可以触发收集，因此这个表就会被"延迟"清理，时间可能就增加了一倍。怎么样，是不是 get 到了一波奇怪的知识。虽然大多数时候不会引起实质性的问题，但是假如系统在处于那种极端情况下，刚好这个表统计信息收集不及时，导致了性能问题，那么买个彩票吧。
同理，vm文件也不会进行复制，假如有个表要插入新行了，switchover 或者 failover 后第一次 vacuum 之前都不会复用这些空间，所以极端情况下可以做个 vacuum。

另外我发现最近很多人在问 PostgreSQL 巡检的事，正好我也有这个打算，使用 go 开发一个巡检工具，一来拾起生疏的 go，二来想着自己撸一个巡检工具出来，我行我上，拭目以待吧。

真是一个有趣的案例啊！好了，That's all，我又要去追三体了。

继续滑动看下一个

PostgreSQL学徒

向上滑动看下一个

《鱿鱼游戏2》今天下午四点开播，网友无心上班了，导演悄悄剧透

人民日报征集“中美友好合作故事”，令人感奋

刘恺威近况曝光，父亲刘丹证实已分手，目前失业在家，没有资源

紧急通告！三高的“克星”终于被找到了！！不是吃素和控糖,而是多喝它....

话费充值活动来了：95元充值100元电话费！

备库是否有自己的统计信息？

1前言

2复现

3分析

4小结

您可能也对以下帖子感兴趣

《鱿鱼游戏2》今天下午四点开播，网友无心上班了，导演悄悄剧透

人民日报征集“中美友好合作故事”，令人感奋

刘恺威近况曝光，父亲刘丹证实已分手，目前失业在家，没有资源

紧急通告！三高的“克星”终于被找到了！！不是吃素和控糖,而是多喝它....

话费充值活动来了：95元充值100元电话费！

生成图片，分享到微信朋友圈

备库是否有自己的统计信息？

1前言

2复现

3分析

4小结

您可能也对以下帖子感兴趣