高可用系列第一章 repmgr

Original xiongcc PostgreSQL学徒

2024-09-29

介绍

repmgr是一款开源的用于集群复制管理和故障转移的工具。它扩展了PostgreSQL内建的hot-standby能力，可以设置热备服务器、流复制监控、故障切换等，由2ndQuadrant开发。

兼容矩阵如下：

repmgr version	Supported?	Latest release	Supported PostgreSQL versions
repmgr 5.3	YES	5.3.2 (2022-05-25)	9.4, 9.5, 9.6, 10, 11, 12, 13, 14
repmgr 5.2	NO	5.2.1 (2020-12-07)	9.4, 9.5, 9.6, 10, 11, 12, 13
repmgr 5.1	NO	5.1.0 (2020-04-13)	9.3, 9.4, 9.5, 9.6, 10, 11, 12
repmgr 5.0	NO	5.0 (2019-10-15)	9.3, 9.4, 9.5, 9.6, 10, 11, 12
repmgr 4.x	NO	4.4 (2019-06-27)	9.3, 9.4, 9.5, 9.6, 10, 11
repmgr 3.x	NO	3.3.2 (2017-05-30)	9.3, 9.4, 9.5, 9.6
repmgr 2.x	NO	2.0.3 (2015-04-16)	9.0, 9.1, 9.2, 9.3, 9.4

特点

Tool | PG cluster replication management tool repmgr

repmgr对集群内节点的管理采用分布式的管理方式，每个节点都有一个 repmgr.conf 配置文件，用来记录本节点的 ID、节点名称、连接串信息、数据库配置参数和管理操作命令等。在完成参数配置后，就可以通过 repmgr 命令实现对集群节点的 “一键式” 部署。不同于Patroni，repmgr会将信息存储在本地数据库内，比如操作日志、元信息等，因此 repmgr 相较于 Patroni 要轻量一些。

Implemented as a PostgreSQL extension
Replication cluster monitoring
Standby cloning with pg_basebackup or Barman
Timeline following:

a standby that can be promoted to a primary without requiring a restart
other standbys that can connect to the new master without being resynced

Cascading standby support

Standbys not directly connected to the master node are not affected during failover of the primary to another standby mode

Replication slot support , simplifying WAL retention management
Switchover support for role-switching between primary and standby

选举机制

代码流程在repmgrd-physical.c中

  /* don't check 0-priority nodes */
  if (cell->node_info->priority <= 0)
  {
   log_info(_("node \"%s\" (ID: %i) has priority of %i, skipping"),
        cell->node_info->node_name,
        cell->node_info->node_id,
        cell->node_info->priority);
   continue;
  }


  /* get node's last receive LSN - if "higher" than current winner, current node is candidate */
  cell->node_info->last_wal_receive_lsn = sibling_replication_info.last_wal_receive_lsn;

  log_info(_("last receive LSN for sibling node \"%s\" (ID: %i) is: %X/%X"),
     cell->node_info->node_name,
     cell->node_info->node_id,
     format_lsn(cell->node_info->last_wal_receive_lsn));
 
  /* compare LSN */
  if (cell->node_info->last_wal_receive_lsn > candidate_node->last_wal_receive_lsn)   ---先比较LSN
  {
   /* other node is ahead */
   log_info(_("node \"%s\" (ID: %i) is ahead of current candidate \"%s\" (ID: %i)"),
      cell->node_info->node_name,
      cell->node_info->node_id,
      candidate_node->node_name,
      candidate_node->node_id);

   candidate_node = cell->node_info;
  }
  /* LSN is same - tiebreak on priority, then node_id */    ---再比较优先级和node_id
  else if (cell->node_info->last_wal_receive_lsn == candidate_node->last_wal_receive_lsn)
  {
   log_info(_("node \"%s\" (ID: %i) has same LSN as current candidate \"%s\" (ID: %i)"),
      cell->node_info->node_name,
      cell->node_info->node_id,
      candidate_node->node_name,
      candidate_node->node_id);

可以看到，当发生了failover时，repmgr选举候选备节点会以以下顺序选举：LSN > Priority > Node_ID。若LSN一样，会根据priority优先级进行比较，该优先级是在配置文件中进行参数配置，将priority设置为0会禁止参与选主。若优先级也一样，会比较节点的Node ID，小者会优先选举

安装部署

从此处下载最新的安装包 https://github.com/EnterpriseDB/repmgr

git clone https://github.com/EnterpriseDB/repmgr
./configure && make install

编译安装后，在PostgreSQL安装目录的bin目录中会多出repmgr和repmgrd两个二进制文件。repmgr是一个命令行工具，日常操作主要通过repmgr进行操作，功能包括集群状态查看、switch over、克隆备库、失效节点重新加入等。repmgrd 是一个守护进程，支持故障检测、failover，监控和记录集群信息以及自定义脚本接受集群事件通知event_notification_command，比如邮件通知和告警等。

[postgres@xiongcc ~]$ repmgr --help
repmgr: replication management tool for PostgreSQL

Usage:
    repmgr [OPTIONS] primary {register|unregister}
    repmgr [OPTIONS] standby {register|unregister|clone|promote|follow|switchover}
    repmgr [OPTIONS] node    {status|check|rejoin|service}
    repmgr [OPTIONS] cluster {show|event|matrix|crosscheck|cleanup}
    repmgr [OPTIONS] witness {register|unregister}
    repmgr [OPTIONS] service {status|pause|unpause}
    repmgr [OPTIONS] daemon  {start|stop}

常用命令

repmgr primary register/unregister，注册/注销主库
repmgr standby clone，克隆备库
repmgr standby promote，提升为主库
repmgr standby follow，重新指向新的主库
repmgr node status，查看节点状态
repmgr node check，校验节点状态
repmgr cluster show，查看集群状态
repmgr cluster event，查看集群已发生事件
repmgr cluster crosscheck，集群中各个节点交叉检测
repmgr cluster cleanup，清除历史集群监控信息

主库部分

配置主库postgresql.conf文件

修改主库的postgresql.conf文件

max_wal_senders = 10       # 每个备库一个walsender进程，该值需要大于备库的数量
max_replication_slots = 10 # 复制槽的数量，用于防止WAL日志被回收导致备库丢失的问题
wal_level = 'hot_standby'  # WAL日志级别，流复制要求必须大于等于replica级别
hot_standby = on           # 备库是否支持只读操作，默认为on，否则只作为备库

创建元信息

新建repmgr用户，作为repmgr工具的默认用户

postgres=# create user repmgr;
CREATE ROLE
postgres=# alter user repmgr with password 'test1234';
ALTER ROLE
postgres=# alter user repmgr superuser ;
ALTER ROLE
postgres=# create database repmgr owner repmgr;
CREATE DATABASE

配置repmgr.conf文件

[postgres@xiongcc ~]$ cat repmgr_primary.conf 
# repmgr node information
node_id=1                                   # 节点ID
node_name='primary_node'                    # 节点名称
conninfo='host=localhost port=5432 user=repmgr dbname=repmgr' # 连接串信息
data_directory='/home/postgres/pgdata'      # 实例目录
replication_user=repmgr                     # 流复制用户
repmgr_bindir='/usr/pgsql-14/bin'           # repmgr 可执行文件所在目录
pg_bindir='/usr/pgsql-14/bin'               # PostgreSQL 可执行文件所在目录

# replication
monitoring_history=yes              # Whether to write monitoring data to the "monitoring_history" table
# location='location1'              # 定义节点位置的任意字符串,在故障转移期间用于检查当前主节点的可见性
priority=100                        # 节点优先级，选主时可能使用到。（lsn > priority > node_id）
                                    # 0 代表该节点不会被提升为主节点
reconnect_interval=10               # 故障转移之前，尝试重新连接的间隔（以秒为单位）
reconnect_attempts=6                # 故障转移之前，尝试重新连接的次数
connection_check_type=ping          # ping: repmg 使用PQPing() 方法测试连接
                                    # connection: 尝试与节点建立新的连接
                                    # query: 通过现有连接在节点上执行 SQL 语句
monitor_interval_secs=5             # 写入监控数据的间隔
use_replication_slots=true          # 是否使用复制槽

# log configuration
log_level='INFO'                 # Log level: possible values are DEBUG, INFO, NOTICE,
                                 # WARNING, ERROR, ALERT, CRIT or EMERG

log_facility='STDERR'            # Logging facility: possible values are STDERR, or for
                                 # syslog integration, one of LOCAL0, LOCAL1, ..., LOCAL7, USER

log_file='/tmp/repmgr.log'       # STDERR can be redirected to an arbitrary file
log_status_interval=30           # interval (in seconds) for repmgrd to log a status message


# service management command
service_start_command='/usr/pgsql-14/bin/pg_ctl -D /home/postgres/pgdata start'         # 数据库启动 
service_stop_command='/usr/pgsql-14/bin/pg_ctl -D /home/postgres/pgdata stop'          # 数据库停止
service_restart_command='/usr/pgsql-14/bin/pg_ctl -D /home/postgres/pgdata restart'       # 数据库重启
service_reload_command='/usr/pgsql-14/bin/pg_ctl -D /home/postgres/pgdata reload'        # 数据库重加载
repmgrd_pid_file='/tmp/repmgrd.pid'                               # repmgrd守护进程pid
repmgrd_service_start_command='/usr/pgsql-14/bin/repmgrd -f /home/postgres/repmgr.conf start'  # repmgrd启动
repmgrd_service_stop_command='kill -9 `cat /tmp/repmgrd.pid`'                  # repmgrd停止

# failover 
failover='automatic'                    # one of 'automatic', 'manual'.
                                        # determines what action to take in the event of upstream failure
                                        #
                                        # 'automatic': repmgrd will automatically attempt to promote the
                                        #    node or follow the new upstream node
                                        # 'manual': repmgrd will take no action and the node will require
                                        #    manual attention to reattach it to replication

#priority=100                           # indicates a preferred priority for promoting nodes;
                                        # a value of zero prevents the node being promoted to primary
                                        # (default: 100)

promote_command='repmgr -f /home/postgres/repmgr_primary.conf standby promote'                  
                    # command repmgrd executes when promoting a new primary; use something like:
                                        #
                                        #     repmgr standby promote -f /etc/repmgr.conf
                                        #
follow_command='repmgr standby follow -f /home/postgres/repmgr_primary.conf --upstream-node-id=%n'                      
                    # command repmgrd executes when instructing a standby to follow a new primary;
                                        # use something like:
                                        #
                                        #     repmgr standby follow -f /etc/repmgr.conf --upstream-node-id=%n
                                        #

#primary_visibility_consensus=false     # If "true", only continue with failover if no standbys have seen
                                        # the primary node recently. *Must* be the same on all nodes.s

这里需要提一下location和primary_visibility_consensus这两个参数。

location

location用于处理脑裂的问题，详见 https://repmgr.org/docs/current/repmgrd-network-split.html

“
repmgr4 introduces the concept of location: each node is associated with an arbitrary location string (default is default); this is set in repmgr.conf, e.g.:
node_id=1 node_name=node1 conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2' data_directory='/var/lib/postgresql/data' location='dc1'
In a failover situation, repmgrd will check if any servers in the same location as the current primary node are visible. If not, repmgrd will assume a network interruption and not promote any node in any other location (it will however enter degraded monitoring mode until a primary becomes visible).
在故障转移情况下，repmgrd会检查与当前主节点位于同一位置的任何服务器是否可见。如果不是，repmgrd 将假定网络中断并且不会提升任何其他位置的任何节点（但是它将进入降级监控模式，直到主节点变得可见）。

degraded monitoring

那么什么是降级监控呢？官网上有介绍：https://repmgr.org/docs/4.0/repmgrd-degraded-monitoring.html，官网例子如下

    [2017-08-29 10:59:19] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state (automatic failover disabled)
    [2017-08-29 10:59:33] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
    [2017-08-29 10:59:33] [INFO] checking state of node 1, 1 of 5 attempts
    [2017-08-29 10:59:33] [INFO] sleeping 1 seconds until next reconnection attempt
    (...)
    [2017-08-29 10:59:37] [INFO] checking state of node 1, 5 of 5 attempts
    [2017-08-29 10:59:37] [WARNING] unable to reconnect to node 1 after 5 attempts
    [2017-08-29 10:59:37] [NOTICE] this node is not configured for automatic failover so will not be considered as promotion candidate
    [2017-08-29 10:59:37] [NOTICE] no other nodes are available as promotion candidate
    [2017-08-29 10:59:37] [HINT] use "repmgr standby promote" to manually promote this node
    [2017-08-29 10:59:37] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state (automatic failover disabled)
    [2017-08-29 10:59:53] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state (automatic failover disabled)
    [2017-08-29 11:00:45] [NOTICE] reconnected to upstream node 1 after 68 seconds, resuming monitoring
    [2017-08-29 11:00:57] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state (automatic failover disabled)

当碰到如下这些场景就会进入到degraded monitoring模式，无法监控其上游节点

a failover situation has occurred, no nodes in the primary node's location are visible，发生了failover，主节点所在的地区没有其他节点可见
a failover situation has occurred, but no promotion candidate is available，没有候选者
a failover situation has occurred, but the promotion candidate could not be promoted，候选者无法被提升为主库
a failover situation has occurred, but the node was unable to follow the new primary
a failover situation has occurred, but no primary has become available，没有主节点可用
a failover situation has occurred, but automatic failover is not enabled for the node
repmgrd is monitoring the primary node, but it is not available (and no other node has been promoted as primary)

我们可以设置degraded_monitoring_timeout参数，超时就会将repmgrd干掉。

primary_visibility_consensus

primary_visibility_consensus，共识算法，从 repmgr 4.4 开始，受影响的备机现在可以建立一个共识，即主库是否仍然对某些备机可用（"主机可见性共识"），轮询每个备机（假如还有witness的话）最后一次看到主库的时间，如果有任何备库最近看到了主库，就可以合理地推断出主库仍然可用，不应该启动故障转移，参考 https://repmgr.org/docs/current/repmgrd-primary-visibility-consensus.html

$ repmgr -f /etc/repmgr.conf service status
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node1 | primary | * running |          | running | 27259 | no      | n/a
 2  | node2 | standby |   running | node1    | running | 27272 | no      | 1 second(s) ago
 3  | node3 | standby |   running | node1    | running | 27282 | no      | 0 second(s) ago
 4  | node4 | witness | * running | node1    | running | 27298 | no      | 1 second(s) ago

如下日志中写得很清楚，cancelling failover as some nodes can still see the primary，停止了Failover。

    [2019-05-17 05:36:12] [WARNING] unable to reconnect to node 1 after 3 attempts
    [2019-05-17 05:36:12] [INFO] 2 active sibling nodes registered
    [2019-05-17 05:36:12] [INFO] local node's last receive lsn: 0/7006E58
    [2019-05-17 05:36:12] [INFO] checking state of sibling node "node3" (ID: 3)
    [2019-05-17 05:36:12] [INFO] node "node3" (ID: 3) reports its upstream is node 1, last seen 1 second(s) ago
    [2019-05-17 05:36:12] [NOTICE] node 3 last saw primary node 1 second(s) ago, considering primary still visible
    [2019-05-17 05:36:12] [INFO] last receive LSN for sibling node "node3" (ID: 3) is: 0/7006E58
    [2019-05-17 05:36:12] [INFO] node "node3" (ID: 3) has same LSN as current candidate "node2" (ID: 2)
    [2019-05-17 05:36:12] [INFO] checking state of sibling node "node4" (ID: 4)
    [2019-05-17 05:36:12] [INFO] node "node4" (ID: 4) reports its upstream is node 1, last seen 0 second(s) ago
    [2019-05-17 05:36:12] [NOTICE] node 4 last saw primary node 0 second(s) ago, considering primary still visible
    [2019-05-17 05:36:12] [INFO] last receive LSN for sibling node "node4" (ID: 4) is: 0/7006E58
    [2019-05-17 05:36:12] [INFO] node "node4" (ID: 4) has same LSN as current candidate "node2" (ID: 2)
    [2019-05-17 05:36:12] [INFO] 2 nodes can see the primary
    [2019-05-17 05:36:12] [DETAIL] following nodes can see the primary:
     - node "node3" (ID: 3): 1 second(s) ago
     - node "node4" (ID: 4): 0 second(s) ago
    [2019-05-17 05:36:12] [NOTICE] cancelling failover as some nodes can still see the primary
    [2019-05-17 05:36:12] [NOTICE] election cancelled
    [2019-05-17 05:36:14] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state

注册主节点

接着正式注册主节点

[postgres@xiongcc ~]$ repmgr -f repmgr_primary.conf primary register
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered

注册成功之后，repmgr库内会安装插件，以及创建一些元信息表和视图

repmgr=# set search_path to 'repmgr';
SET
repmgr=# \d
              List of relations
 Schema |        Name        | Type  | Owner  
--------+--------------------+-------+--------
 repmgr | events             | table | repmgr
 repmgr | monitoring_history | table | repmgr
 repmgr | nodes              | table | repmgr
 repmgr | replication_status | view  | repmgr
 repmgr | show_nodes         | view  | repmgr
 repmgr | voting_term        | table | repmgr
(6 rows)

repmgr.events：records events of interest
repmgr.nodes：实例中各个节点的连接和状态信息
repmgr.monitoring_history：由repmgrd写入，备库的历史监控信息
repmgr.show_nodes：based on the table repmgr.nodes, additionally showing the name of the server's upstream node
repmgr.replication_status：when repmgrd's monitoring is enabled, shows current monitoring status for each standby.

注意，repmgr的元信息不能放在不受 repmgr 所纳管的库中

“
Note that the repmgr metadata schema cannot reside on a database server which is not part of the replication cluster managed by repmgr.

使用 cluster show查看当前集群的状态

[postgres@xiongcc ~]$ repmgr -f repmgr_primary.conf cluster show
 ID | Name         | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                 
----+--------------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------
 1  | primary_node | primary | * running |          | default  | 100      | 1        | host=localhost port=5432 user=repmgr dbname=repmgr

数据库中记录了一些事件和节点的信息

repmgr=# select * from events ;
 node_id |      event       | successful |        event_timestamp        | details 
---------+------------------+------------+-------------------------------+---------
       1 | cluster_created  | t          | 2022-07-20 15:14:09.950789+08 | 
       1 | primary_register | t          | 2022-07-20 15:14:09.9572+08   | 
(2 rows)

repmgr=# select * from nodes ;
 node_id | upstream_node_id | active |  node_name   |  type   | location | priority |                      conninfo        
              | repluser | slot_name |            config_file             
---------+------------------+--------+--------------+---------+----------+----------+--------------------------------------
--------------+----------+-----------+------------------------------------
       1 |                  | t      | primary_node | primary | default  |      100 | host=localhost port=5432 user=repmgr 
dbname=repmgr | repmgr   |           | /home/postgres/repmgr_primary.conf
(1 row)

repmgrd的作用

前面也简单介绍了repmgrd的作用，repmgrd是一个守护进程，支持故障检测、failover，监控和记录集群信息以及自定义脚本接受集群事件通知。

repmgrd ("replication manager daemon") is a management and monitoring daemon which runs on each node in a replication cluster. It can automate actions such as failover and updating standbys to follow the new primary, as well as providing monitoring information about the state of each standby.

repmgrd is designed to be straightforward to set up and does not require additional external infrastructure.

Functionality provided by repmgrd includes:

wide range of configuration options
option to execute custom scripts ("event notifications at different points in the failover sequence
ability to pause repmgrd operation on all nodes with a single command
optional witness server
"location" configuration option to restrict potential promotion candidates to a single location (e.g. when nodes are spread over multiple data centres)
choice of method to determine node availabile

启动repmgrd

启动之前，需要先在postgresql.conf配置文件中添加shared_preload_libraries = 'repmgr'

[postgres@xiongcc ~]$ tail -f /tmp/repmgr.log 
[2022-07-20 15:58:55] [NOTICE] repmgrd (repmgrd 5.4dev) starting up
[2022-07-20 15:58:55] [INFO] connecting to database "host=localhost port=5432 user=repmgr dbname=repmgr"
INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2022-07-20 15:58:55] [NOTICE] starting monitoring of node "primary_node" (ID: 1)
[2022-07-20 15:58:55] [INFO] "connection_check_type" set to "ping"
[2022-07-20 15:58:55] [NOTICE] monitoring cluster primary "primary_node" (ID: 1)
[2022-07-20 15:59:25] [INFO] monitoring primary node "primary_node" (ID: 1) in normal state

备库部分

配置repmgr.conf文件

此例主备位于同一台主机上，因此节点信息不能相同

[postgres@xiongcc ~]$ cat repmgr_standby.conf 
# repmgr node information
node_id=2
node_name='standby_node'
conninfo='host=localhost port=5433 user=repmgr dbname=repmgr'
data_directory='/home/postgres/pgdata_standby'
replication_user=repmgr
repmgr_bindir='/usr/pgsql-14/bin'
pg_bindir='/usr/pgsql-14/bin'


# log configuration
log_level='INFO'                 # Log level: possible values are DEBUG, INFO, NOTICE,
                                 # WARNING, ERROR, ALERT, CRIT or EMERG

log_facility='STDERR'            # Logging facility: possible values are STDERR, or for
                                 # syslog integration, one of LOCAL0, LOCAL1, ..., LOCAL7, USER

log_file='/tmp/repmgr_standby.log'       # STDERR can be redirected to an arbitrary file
log_status_interval=30           # interval (in seconds) for repmgrd to log a status message


# service management command
service_start_command='/usr/pgsql-14/bin/pg_ctl -D /home/postgres/pgdata_standby start'
service_stop_command='/usr/pgsql-14/bin/pg_ctl -D /home/postgres/pgdata_standby stop'
service_restart_command='/usr/pgsql-14/bin/pg_ctl -D /home/postgres/pgdata_standby restart'
service_reload_command='/usr/pgsql-14/bin/pg_ctl -D /home/postgres/pgdata_standby reload'
repmgrd_pid_file='/tmp/repmgrd_standby.pid'
repmgrd_service_start_command='/usr/pgsql-14/bin/repmgrd -f /home/postgres/repmgr.conf start'
repmgrd_service_stop_command='kill -9 `cat /tmp/repmgrd_standby.pid`'

# failover 
failover='automatic'                    # one of 'automatic', 'manual'.
                                        # determines what action to take in the event of upstream failure
                                        #
                                        # 'automatic': repmgrd will automatically attempt to promote the
                                        #    node or follow the new upstream node
                                        # 'manual': repmgrd will take no action and the node will require
                                        #    manual attention to reattach it to replication

#priority=100                           # indicates a preferred priority for promoting nodes;
                                        # a value of zero prevents the node being promoted to primary
                                        # (default: 100)

promote_command='repmgr -f /home/postgres/repmgr_standby.conf standby promote'                  # command repmgrd executes when promoting a new primary; use something like:
                                        #
                                        #     repmgr standby promote -f /etc/repmgr.conf
                                        #
follow_command='repmgr standby follow -f /home/postgres/repmgr_standby.conf --upstream-node-id=%n'                      # command repmgrd executes when instructing a standby to follow a new primary;
                                        # use something like:
                                        #
                                        #     repmgr standby follow -f /etc/repmgr.conf --upstream-node-id=%n
                                        #

#primary_visibility_consensus=false     # If "true", only continue with failover if no standbys have seen
                                        # the primary node recently. *Must* be the same on all nodes.s

克隆备库

在真正克隆之前，使用--dry-run试运行一下，可以发现可能的错误。克隆备库默认使用的是pg_basebackup

[postgres@xiongcc ~]$ repmgr -h localhost -p 5432 -d repmgr -f repmgr_standby.conf  standby clone 
NOTICE: destination directory "/home/postgres/pgdata_standby" provided
INFO: connecting to source node
DETAIL: connection string is: host=localhost port=5432 dbname=repmgr
DETAIL: current installation size is 883 MB
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: data checksums are not enabled and "wal_log_hints" is "off"
DETAIL: pg_rewind requires "wal_log_hints" to be enabled
INFO: creating directory "/home/postgres/pgdata_standby"...
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/pgsql-14/bin/pg_basebackup -l "repmgr base backup"  -D /home/postgres/pgdata_standby -h localhost -p 5432 -U repmgr -X stream 
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: /usr/pgsql-14/bin/pg_ctl -D /home/postgres/pgdata_standby start
HINT: after starting the server, you need to register this standby with "repmgr standby register"

注册备节点

注册备库

[postgres@xiongcc ~]$ pg_ctl -D pgdata_standby/ start
waiting for server to start....2022-07-20 16:39:01.574 CST [8066] LOG:  redirecting log output to logging collector process
2022-07-20 16:39:01.574 CST [8066] HINT:  Future log output will appear in directory "log".
 done
server started
[postgres@xiongcc ~]$ repmgr standby -f repmgr_standby.conf register
INFO: connecting to local node "standby_node" (ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "standby_node" (ID: 2) successfully registered
[postgres@xiongcc ~]$ repmgr -f repmgr_primary.conf cluster show
 ID | Name         | Role    | Status    | Upstream     | Location | Priority | Timeline | Connection string                                 
----+--------------+---------+-----------+--------------+----------+----------+----------+----------------------------------------------------
 1  | primary_node | primary | * running |              | default  | 100      | 1        | host=localhost port=5432 user=repmgr dbname=repmgr
 2  | standby_node | standby |   running | primary_node | default  | 100      | 1        | host=localhost port=5433 user=repmgr dbname=repmgr
 
[postgres@xiongcc ~]$ psql -qx -c "select * from pg_stat_replication"
-[ RECORD 1 ]----+------------------------------
pid              | 8074
usesysid         | 16521
usename          | repmgr
application_name | standby_node
client_addr      | ::1
client_hostname  | 
client_port      | 41350
backend_start    | 2022-07-20 16:39:01.695684+08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/6C000DE8
write_lsn        | 0/6C000DE8
flush_lsn        | 0/6C000DE8
replay_lsn       | 0/6C000DE8
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 0
sync_state       | async
reply_time       | 2022-07-21 10:05:57.93392+08

这样就搭建好了一主一从的实例，application_name对应配置文件中的node_name。

启动repmgrd

[postgres@xiongcc ~]$ repmgrd -f repmgr_standby.conf start
[2022-07-20 16:56:39] [NOTICE] redirecting logging output to "/tmp/repmgr_standby.log"

[postgres@xiongcc ~]$ tail -f /tmp/repmgr_standby.log 
[2022-07-20 16:56:39] [NOTICE] repmgrd (repmgrd 5.4dev) starting up
[2022-07-20 16:56:39] [INFO] connecting to database "host=localhost port=5433 user=repmgr dbname=repmgr"
INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd_standby.pid
[2022-07-20 16:56:39] [NOTICE] starting monitoring of node "standby_node" (ID: 2)
[2022-07-20 16:56:39] [INFO] "connection_check_type" set to "ping"
[2022-07-20 16:56:39] [INFO] monitoring connection to upstream node "primary_node" (ID: 1)

故障切换

现在一主一从，node1是主节点，node2是备节点

[postgres@xiongcc ~]$ repmgr cluster show -f repmgr_primary.conf 
 ID | Name         | Role    | Status    | Upstream     | Location | Priority | Timeline | Connection string                                 
----+--------------+---------+-----------+--------------+----------+----------+----------+----------------------------------------------------
 1  | primary_node | primary | * running |              | default  | 100      | 1        | host=localhost port=5432 user=repmgr dbname=repmgr
 2  | standby_node | standby |   running | primary_node | default  | 100      | 1        | host=localhost port=5433 user=repmgr dbname=repmgr

停止主库

[postgres@xiongcc ~]$ pg_ctl -D pgdata stop
waiting for server to shut down.... done
server stopped

repmgrd发现主库异常，首先会尝试继续连接，根据前面配置的参数，reconnect_attempts和reconnect_interval，尝试连接6次，依旧连不上的话就会做故障切换。

[2022-07-21 14:23:55] [WARNING] unable to ping "host=localhost port=5432 user=repmgr dbname=repmgr"
[2022-07-21 14:23:55] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2022-07-21 14:23:55] [WARNING] unable to connect to upstream node "primary_node" (ID: 1)
[2022-07-21 14:23:55] [INFO] checking state of node "primary_node" (ID: 1), 1 of 6 attempts
[2022-07-21 14:23:55] [WARNING] unable to ping "user=repmgr dbname=repmgr host=localhost port=5432 connect_timeout=2 fallback_application_name=repmgr"
[2022-07-21 14:23:55] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2022-07-21 14:23:55] [INFO] sleeping up to 10 seconds until next reconnection attempt
...
...
[2022-07-21 14:24:45] [INFO] checking state of node "primary_node" (ID: 1), 6 of 6 attempts
[2022-07-21 14:24:45] [WARNING] unable to ping "user=repmgr dbname=repmgr host=localhost port=5432 connect_timeout=2 fallback_application_name=repmgr"
[2022-07-21 14:24:45] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2022-07-21 14:24:45] [WARNING] unable to reconnect to node "primary_node" (ID: 1) after 6 attempts
[2022-07-21 14:24:45] [INFO] 0 active sibling nodes registered
[2022-07-21 14:24:45] [INFO] 2 total nodes registered
[2022-07-21 14:24:45] [INFO] primary node  "primary_node" (ID: 1) and this node have the same location ("default")
[2022-07-21 14:24:45] [INFO] no other sibling nodes - we win by default
[2022-07-21 14:24:45] [NOTICE] this node is the only available candidate and will now promote itself
[2022-07-21 14:24:45] [INFO] promote_command is:
  "repmgr -f /home/postgres/repmgr_standby.conf standby promote"
NOTICE: promoting standby to primary
DETAIL: promoting server "standby_node" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "standby_node" (ID: 2) was successfully promoted to primary
[2022-07-21 14:24:46] [INFO] checking state of node 2, 1 of 6 attempts
[2022-07-21 14:24:46] [NOTICE] node 2 has recovered, reconnecting
[2022-07-21 14:24:46] [INFO] connection to node 2 succeeded
[2022-07-21 14:24:46] [INFO] original connection is still available
[2022-07-21 14:24:46] [INFO] 0 followers to notify
[2022-07-21 14:24:46] [INFO] switching to primary monitoring mode
[2022-07-21 14:24:46] [NOTICE] monitoring cluster primary "standby_node" (ID: 2)
[2022-07-21 14:25:16] [INFO] monitoring primary node "standby_node" (ID: 2) in normal stat

假如我们配置了有location参数的话，那么会进入降级监控的状态

[postgres@xiongcc ~]$ cat /tmp/repmgr.log  | grep degrade
[2022-07-21 11:27:38] [NOTICE] unable to connect to local node, falling back to degraded monitoring
[2022-07-21 11:27:38] [INFO] monitoring primary node "primary_node" (ID: 1) in degraded state
[2022-07-21 11:28:08] [INFO] monitoring primary node "primary_node" (ID: 1) in degraded state
[2022-07-21 11:28:38] [INFO] monitoring primary node "primary_node" (ID: 1) in degraded state

由于repmgr将元信息等存在了本地某个数据库内，因此repmgr本身并不提供分布式解决方案，假如某个节点故障了，便无法通过该节点继续管理集群，比如刚刚宕机的主库，就不能继续操作了

[postgres@xiongcc ~]$ repmgr -f repmgr_primary.conf cluster show
ERROR: connection to database failed
DETAIL: 
connection to server at "localhost" (::1), port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?

DETAIL: attempted to connect using:
  user=repmgr dbname=repmgr host=localhost port=5432 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path=

现在只剩下一个节点，原来的备节点顺利提升为主节点

[postgres@xiongcc ~]$ repmgr cluster show -f repmgr_standby.conf 
 ID | Name         | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                 
----+--------------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------
 1  | primary_node | primary | - failed  | ?        | default  | 100      |          | host=localhost port=5432 user=repmgr dbname=repmgr
 2  | standby_node | primary | * running |          | default  | 100      | 2        | host=localhost port=5433 user=repmgr dbname=repmgr

WARNING: following issues were detected
  - unable to connect to node "primary_node" (ID: 1)

HINT: execute with --verbose option to see connection error messages

原来的主节点可以以备库的身份重新加入

[postgres@xiongcc ~]$ repmgr node rejoin -f repmgr_primary.conf -d 'host=localhost port=5433 user=repmgr' --dry-run
NOTICE: rejoin target is node "standby_node" (ID: 2)
INFO: replication slots in use, 2 free slots on node 10
INFO: replication connection to the rejoin target node was successful
INFO: local and rejoin target system identifiers match
DETAIL: system identifier is 7120054370300950293
INFO: local node 1 can attach to rejoin target node 2
DETAIL: local node's recovery point: 0/73000028; rejoin target node's fork point: 0/730000A0
INFO: prerequisites for executing NODE REJOIN are met
[postgres@xiongcc ~]$ repmgr node rejoin -f repmgr_primary.conf -d 'host=localhost port=5433 user=repmgr' 
NOTICE: rejoin target is node "standby_node" (ID: 2)
INFO: local node 1 can attach to rejoin target node 2
DETAIL: local node's recovery point: 0/73000028; rejoin target node's fork point: 0/730000A0
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=localhost port=5432 user=repmgr dbname=repmgr"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/usr/pgsql-14/bin/pg_ctl -D /home/postgres/pgdata start"
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
[postgres@xiongcc ~]$ repmgr -f repmgr_primary.conf cluster show
 ID | Name         | Role    | Status    | Upstream     | Location | Priority | Timeline | Connection string                                 
----+--------------+---------+-----------+--------------+----------+----------+----------+----------------------------------------------------
 1  | primary_node | standby |   running | standby_node | default  | 100      | 1        | host=localhost port=5432 user=repmgr dbname=repmgr
 2  | standby_node | primary | * running |              | default  | 100      | 2        | host=localhost port=5433 user=repmgr dbname=repmgr

这一步骤可能需要使用到pg_rewind，因为老的主库挂掉的原因多种多样，甚至有可能是高可用系统的误判，而老主库也有可能是在挂掉之后又重新作为主库启动起来，这个时候降级并重搭流复制关系的操作就有可能失败（新的备库比新主库数据更超前，比如异步流复制）

“
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/6E000028

而pg_rewind要求打开checksum或者wal_log_hints，并且使用pg_rewind要保证目的集群从分叉点到现在的WAL日志是连续的，没有被移除，否则在找的分叉点时就会报错。具体细节可以参照阿里内核月报《PgSQL · 特性分析 · 神奇的pg_rewind》。因此在使用repmgr的时候，建议提前配置好归档（repmgr本身并不要求归档），否则pg_rewind可能会报错。

“
pg_rewind: error: could not open file "/home/postgres/pgdata/pg_wal/00000001000000000000006D": No such file or directory

另外我们可以自定义脚本，加强failover的逻辑判断，假如返回的值不是0，就重新选举

[2019-03-13 21:01:30] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds
[2019-03-13 21:01:30] [NOTICE] promotion candidate is "node2" (ID: 2)
[2019-03-13 21:01:30] [NOTICE] executing "failover_validation_command"
[2019-03-13 21:01:30] [DETAIL] /usr/local/bin/failover-validation.sh 2
[2019-03-13 21:01:30] [INFO] output returned by failover validation command:
Node ID: 2

[2019-03-13 21:01:30] [NOTICE] failover validation command returned a non-zero value: "1"
[2019-03-13 21:01:30] [NOTICE] promotion candidate election will be rerun
[2019-03-13 21:01:30] [INFO] 1 followers to notify
[2019-03-13 21:01:30] [NOTICE] notifying node "node3" (ID: 3) to rerun promotion candidate selection
INFO:  node 3 received notification to rerun promotion candidate election
[2019-03-13 21:01:30] [NOTICE] rerunning election after 15 seconds ("election_rerun_interval")

见证服务器witness

witness节点是处理集群主库和备库之间可能存在网络拥塞、延迟、路由等问题影响，导致主库还在正常工作，而备库无法联系主库的场景。通过设置witness节点可以针对主库与备库之间切换的检查完整性，即辅助备节点进行监控，避免因网络问题导致的脑裂现象。注意witness必须配合repmgrd。Note: A witness server will only be useful if repmgrd is in use.

“
A typical use case for a witness server is a two-node streaming replication setup, where the primary and standby are in different locations (data centres). By creating a witness server in the same location (data centre) as the primary, if the primary becomes unavailable it's possible for the standby to decide whether it can promote itself without risking a "split brain" scenario: if it can't see either the witness or the primary server, it's likely there's a network-level interruption and it should not promote itself. If it can see the witness but not the primary, this proves there is no network interruption and the primary itself is unavailable, and it can therefore promote itself (and ideally take action to fence the former primary).
见证服务器的典型用例是双节点流复制中，其中主服务器和备用服务器位于不同的位置（数据中心）。通过在与主服务器相同的位置（数据中心）创建见证服务器，如果主服务器不可用，备用服务器可以决定是否可以提升自己而不会冒“脑裂”场景的风险：如果它看不到任何一个见证服务器或主服务器，那么很可能存在网络级别的中断，它不应该提升自己。如果它可以看到见证服务器但看不到主节点，这证明没有网络中断并且主节点本身不可用，因此它可以提升自己（并且理想地采取行动来隔离前一个主节点）。

配置流程可以参考 https://repmgr.org/docs/4.0/creating-witness-server.html

PG Phriday: Isolating Postgres with repmgr

级联复制

Cascading replication - where a standby can connect to an upstream node and not the primary server itself - was introduced in PostgreSQL 9.2. repmgr and repmgrd support cascading replication by keeping track of the relationship between standby servers - each node record is stored with the node id of its upstream ("parent") server (except of course the primary server).

In a failover situation where the primary node fails and a top-level standby is promoted, a standby connected to another standby will not be affected and continue working as normal (even if the upstream standby it's connected to becomes the primary node). If however the node's direct upstream fails, the "cascaded standby" will attempt to reconnect to that node's parent.

虚拟IP

repmgr本身不支持虚拟IP的功能，我们可以借助keepalived来实现虚拟IP的功能。另外在配置文件中，也可以设置promote_command为一个自定义脚本，在脚本里面我们便可以自行实现虚拟IP的功能。比如

“
ssh xxx@xxx -o ifconfig xxx down

hg_repmgr

hg_repmgr是对开源工具repmgr做的增强型开发工具，Repmgr作为一个流复制管理工具，在集群高可用功能上比较欠缺。为此hg_repmgr做了功能增强，增加了virtual ip，自动恢复失效节点、脑裂检查等。代码仓库：https://github.com/HighgoSoftware/hg_repmgr

对浮动virtual IP的管理：注册Primary节点时，会绑定Virtual IP；failover或switchover时，Virtual IP会随着主节点漂移
集群切换后，节点自动重归集群功能：当集群主备节点切换后，原主节点的daemon进程将自动尝试rejoin操作重归集群
新增node startup命令：节点断电重启后可以由该命令统一启动数据库和repmgrd服务，该命令将自动判断当前集群中节点的主备状态，防止在启动时形成双主和脑裂
脑裂（双主）的检查与自动恢复功能：本功能主要针对可能的对集群误操作导致的脑裂双主情况进行自动恢复，新增了可以检查集群是否存在双主情况，如果发现集群双主（脑裂）将选择其中一个节点做rejoin操作，使其作为备节点重归集群。
对硬盘可写的检测：daemon进程会对数据库data目录所在分区是否可写进行监控
对同步流复制转异步的灵活控制：在数据库处于同步流复制下，备节点停库将导致主节点等待，造成业务中断，新增了逻辑控制，可通过配置项控制这种情形下是否将主节点临时改为异步流复制，待备节点恢复后再改回同步流复制。这样增加了灵活性。
改进了cluster show的信息：在cluster show时增加了对流复制LSN的显示和主备之间LSN差值的显示，用来提示当前流复制的状态

注意事项

建议通过配置SSH免密在节点之间进行通信，方便运维操作。

“
When repmgr is executed via SSH (e.g. when running repmgr standby switchover, repmgr cluster matrix or repmgr cluster crosscheck, or if it is executed as cronjob), a login shell will not be used and only the default system PATH will be set. Therefore it's recommended to set pg_bindir so repmgr can correctly invoke binaries on a remote system and avoid potential path issues.

小结

以上就是关于repmgr的种种，瑕不掩瑜，repmgr 是一款功能齐全、轻量化的开源HA工具，还有增强版的hg_repmgr供我们使用。

参考

https://repmgr.org/docs/5.2/

https://github.com/HighgoSoftware/hg_repmgr

PostgreSQL高可用套件-repmgr+pgpool (上篇）

https://blog.csdn.net/weixin_37692493/article/details/117032458

https://www.enterprisedb.com/blog/pg-phriday-isolating-postgres-repmgr

https://www.2ndquadrant.com/en/blog/how-to-automate-postgresql-12-replication-and-failover-with-repmgr-part-2/

继续滑动看下一个

PostgreSQL学徒

向上滑动看下一个

《鱿鱼游戏2》今天下午四点开播，网友无心上班了，导演悄悄剧透

人民日报征集“中美友好合作故事”，令人感奋

刘恺威近况曝光，父亲刘丹证实已分手，目前失业在家，没有资源

紧急通告！三高的“克星”终于被找到了！！不是吃素和控糖,而是多喝它....

话费充值活动来了：95元充值100元电话费！

高可用系列第一章 repmgr

介绍

特点

选举机制

安装部署

常用命令

主库部分

配置主库postgresql.conf文件

创建元信息

配置repmgr.conf文件

location

degraded monitoring

primary_visibility_consensus

注册主节点

repmgrd的作用

启动repmgrd

备库部分

配置repmgr.conf文件

克隆备库

注册备节点

启动repmgrd

故障切换

见证服务器witness

级联复制

虚拟IP

hg_repmgr

注意事项

小结

参考

您可能也对以下帖子感兴趣

《鱿鱼游戏2》今天下午四点开播，网友无心上班了，导演悄悄剧透

人民日报征集“中美友好合作故事”，令人感奋

刘恺威近况曝光，父亲刘丹证实已分手，目前失业在家，没有资源

紧急通告！三高的“克星”终于被找到了！！不是吃素和控糖,而是多喝它....

话费充值活动来了：95元充值100元电话费！

生成图片，分享到微信朋友圈

高可用系列第一章 repmgr

介绍

特点

选举机制

安装部署

常用命令

主库部分

配置主库postgresql.conf文件

创建元信息

配置repmgr.conf文件

location

degraded monitoring

primary_visibility_consensus

注册主节点

repmgrd的作用

启动repmgrd

备库部分

配置repmgr.conf文件

克隆备库

注册备节点

启动repmgrd

故障切换

见证服务器witness

级联复制

虚拟IP

hg_repmgr

注意事项

小结

参考

您可能也对以下帖子感兴趣