MGR 5.7滚动升级MGR 8.0
MGR 5.7滚动升级MGR 8.0
//前几天遇到了一个MGR 5.7版本的bug,报错信息如下:
mysql> update nm2domain set status=1 where db_domain_info_id=1316542 and id=1210974;
ERROR 3101 (HY000): Plugin instructed the server to rollback the current transaction.
我的运行环境:MGR 5.7版本、多主环境
在执行上面一条更新SQL的时候,发现了上述报错,这条数据在MGR的每个节点上都进行了查看,数据都是有的。
经过排查以及咨询行业内的大佬,最终将问题根因缩小到下面2种情况:
1、MGR多主模式下的认证错误
2、怀疑是MySQL5.7版本MGR的一个bug
解决方案:
将线上的MGR集群从多主模式切换为单主模式,然后再切回多主模式,问题得到了解决。
基于这个问题,最近计划将MGR的版本从MySQL5.7升级到MySQL8.0,今晚抽空在线上环境中测试了一下MGR的滚动升级,这里将部分结论和过程记录一下。
这里我先说结论:测试的结果是,支持升级
升级的方案:
如果你是单主模式的MGR集群:
1、依次升级MGR的Secondary节点。注意,这里建议是新增一个MySQL8.0的节点,然后下掉一个MySQL5.7的节点。直到集群中只有Primary节点是MySQL5.7版本
2、升级MGR的Primary节点。在Primary节点上执行stop group_replication,触发MGR的自动切换,此时会自动选择一个8.0版本的MySQL节点作为新的Primary
3、升级旧的Primary节点为MySQL8.0版本,并重新加入到集群中。
如果你是多主模式的MGR集群:
依次使用MySQL8.0节点代替MySQL5.7的节点即可
在MGR5.7中,使用下面语句查看MGR成员:
localhost.(none)>SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 00a6598b-d80d-11eb-98e8-70e28405e238 | 10.xx.2.64 | 1234 | ONLINE |
| group_replication_applier | 0f5765ad-d80d-11eb-a3f7-70e28405e217 | 10.xx.2.66 | 1234 | ONLINE |
| group_replication_applier | 1a17875e-d815-11eb-81d8-6c92bf07cfcb | 10.xx.6.119 | 1234 | ONLINE |
| group_replication_applier | c3a8a71d-d80c-11eb-afdd-70e28405e29b | 10.xx.2.203 | 1234 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
4 rows in set (0.00 sec)
在MySQL8.0中,MGR成员的信息更加丰富,下面分别是多主模式和单主模式下的MGR成员:
多主模式:
localhost.(none)>SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 00a6598b-d80d-11eb-98e8-70e28405e238 | 10.xx.2.64 | 1234 | ONLINE | PRIMARY | 5.7.24 |
| group_replication_applier | 0f5765ad-d80d-11eb-a3f7-70e28405e217 | 10.xx.2.66 | 1234 | ONLINE | PRIMARY | 8.0.20 |
| group_replication_applier | 1a17875e-d815-11eb-81d8-6c92bf07cfcb | 10.xx.6.119 | 1234 | ONLINE | PRIMARY | 8.0.20 |
| group_replication_applier | c3a8a71d-d80c-11eb-afdd-70e28405e29b | 10.xx.2.203 | 1234 | ONLINE | PRIMARY | 5.7.24 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
4 rows in set (0.00 sec)
只有Primary字眼。
单主模式:
localhost.(none)>SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 00a6598b-d80d-11eb-98e8-70e28405e238 | 10.xx.2.64 | 1234 | ONLINE | SECONDARY | 5.7.24 |
| group_replication_applier | 0f5765ad-d80d-11eb-a3f7-70e28405e217 | 10.xx.2.66 | 1234 | ONLINE | SECONDARY | 8.0.20 |
| group_replication_applier | 1a17875e-d815-11eb-81d8-6c92bf07cfcb | 10.69.6.119 | 1234 | ONLINE | SECONDARY | 8.0.20 |
| group_replication_applier | c3a8a71d-d80c-11eb-afdd-70e28405e29b | 10.69.2.203 | 1234 | ONLINE | PRIMARY | 5.7.24 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
4 rows in set (0.00 sec)
有Primary字样和Secondary字样
可以看到,相比MGR5.7,MGR 8.0多了两个关键的列,member_role和member_version,分别代表成员在集群中的角色和成员的MySQL版本。
从上面的member version中不难看出,MGR是兼容不同的版本在同一个集群出现的。
测试的过程中,发现了一个现象:
如果你的集群中都是MySQL5.7版本的成员,此时加入MySQL8.0版本的成员一切正常;
如果你的集群中有MySQL5.7版本的成员,同时又有MySQL8.0版本的成员,那么你加入MySQL8.0版本的成员不会报错,而加入MySQL5.7版本的成员会报错。报错信息如下:
2021-06-28T23:43:42.487492+08:00 0 [Note] Plugin group_replication reported: 'XCom protocol version: 3'
2021-06-28T23:43:42.487550+08:00 0 [Note] Plugin group_replication reported: 'XCom initialized and ready to accept incoming connections on port 12340'
2021-06-28T23:43:45.939855+08:00 0 [ERROR] Plugin group_replication reported: 'Member version is incompatible with the group'
新增MGR节点配置和加入集群命令:
-----新节点配置文件内容--------
# Group Replication
transaction-isolation = READ-COMMITTED
binlog_checksum = NONE
master_info_repository = TABLE
relay_log_info_repository = TABLE
plugin-load = group_replication.so
transaction_write_set_extraction = XXHASH64
loose-group_replication_group_name = "xxxx"
loose-group_replication_start_on_boot = OFF
loose-group_replication_local_address = 'xxx.xxx.xxx.xxx:12340'
loose-group_replication_group_seeds = ''
group_replication_ip_whitelist = '10.0.0.0/8'
loose-group_replication_bootstrap_group = OFF
loose-group_replication_single_primary_mode = OFF
loose-group_replication_enforce_update_everywhere_checks = ON
loose-group_replication_transaction_size_limit = 52428800
# loose-group_replication_unreachable_majority_timeout
report_host = 'xxx.xxx.xxx.xxx'
report_port = 1234
-----新节点加入集群命令-----
INSTALL PLUGIN group_replication SONAME 'group_replication.so';
change master to master_user='replica',master_password='xxxxxx' FOR CHANNEL 'group_replication_recovery';
start group_replication;