深度剖析 MultiXactID
前言
话接上回 👉🏻 再唠唠子事务,来自 Rare but extremely challenging Postgres Performance Problems by Dilip Kumar (EDB)。让我们继续分析另外一个案例——MultiXact。
分析
第二个案例如上:使用 pgbench 压测,不过这次压测模型有所不同,新增了一条 select for update 语句。右侧是压测结果,可以看到,当开启了一个长事务之后,TPS 不断下降,而左下角的等待事件全部集中在了 MultiXactOffsetSLRU 和 MultiXactOffestBuffer。
以上是问题现象,要分析此案例,需要先回顾一下 MultiXact 相关知识,在 README.tuplock 有 MultiXact 相关介绍:
“A tuple header provides very limited space for storing information about tuple locking and updates: there is room only for a single Xid and a small number of infomask bits. Whenever we need to store more than one lock, we replace the first locker's Xid with a new MultiXactId. Each MultiXact provides extended locking data; it comprises an array of Xids plus some flags bits for each one. The flags are currently used to store the locking strength of each member transaction. (The flags also distinguish a pure locker from an updater.)
In earlier PostgreSQL releases, a MultiXact always meant that the tuple was locked in shared mode by multiple transactions. This is no longer the case; a MultiXact may contain an update or delete Xid. (Keep in mind that tuple locks in a transaction do not conflict with other tuple locks in the same transaction, so it's possible to have otherwise conflicting locks in a MultiXact if they belong to the same transaction).
...
由于 PostgreSQL 的行锁基于 xmax + infomask 实现,并且存储于磁盘上,而不是内存中 (试想如果一次性要锁千万行,内存就炸了),并且为了区分是被删除或者更新的行,还需要 infomask 进行区分 (HEAP_XMAX_LOCK_ONLY
) 。
而行锁根据冲突矩阵,select for share 是互不冲突的,因此假如有许多个事务同时锁定了某一行,由于 xmax 是固定大小,并且维护这么多事务的代价也很高,所以 PostgreSQL 引入了 MultiXact 机制——将一组事务 ID 记录到一个 MultiXactID 中,进行映射。
由于 MultiXactID 是一对多的关系,这个"多"并不固定,所以引入了两个 SLRU 缓冲区,分别记录 Members 和 Offests,Postmaster 启动后,便会在共享内存中分配这么一块区域。
postgres=# select * from pg_shmem_allocations where name ilike '%multi%';
name | off | size | allocated_size
------------------------+---------+--------+----------------
Shared MultiXact State | 5793792 | 1028 | 1152
MultiXactMember | 5660160 | 133568 | 133632
MultiXactOffset | 5593344 | 66816 | 66816
(3 rows)
Offests 用于标识 MultiXactID 的偏移量,Members 用于记录偏移量的大小 (即成员事务 ID )
pg_multixact 子目录分别存储了 Members 和 offsets 相关信息。
[postgres@xiongcc 15data]$ ll pg_multixact/
total 8
drwx------ 2 postgres postgres 4096 Jun 2 14:19 members
drwx------ 2 postgres postgres 4096 Jun 2 14:19 offsets
在 multixact.c 中也对 MultiXact 的结构有所说明:
“We use two SLRU areas, one for storing the offsets at which the data starts for each MultiXactId in the other one. This trick allows us to store variable length arrays of TransactionIds. (We could alternatively use one area containing counts and TransactionIds, with valid MultiXactId values pointing at slots containing counts; but that way seems less robust since it would get completely confused if someone inquired about a bogusMultiXactId that pointed to an intermediate slot containing an XID.)
我们使用两个 SLRU 区域,一个用于存储另一个区域中每个 MultiXactId 数据开始的偏移量。这个技巧允许我们存储可变长度的 TransactionId 数组。
让我们试验一下,获取一下行锁:
postgres=# create table test_lock(id int);
CREATE TABLE
postgres=# insert into test_lock values(1),(2);
INSERT 0 2
postgres=# begin;
BEGIN
postgres=*# select pg_backend_pid(),txid_current();
pg_backend_pid | txid_current
----------------+--------------
3325 | 123898
(1 row)
postgres=*# select id from test_lock where id = 1 for key share;
id
----
1
(1 row)
使用 pgrowlocks 查看行锁:
postgres=# select * from test_lock as t,pgrowlocks('test_lock') as lc where t.ctid = lc.locked_row;
id | locked_row | locker | multi | xids | modes | pids
----+------------+--------+-------+----------+-------------------+--------
1 | (0,1) | 123898 | f | {123898} | {"For Key Share"} | {3325}
(1 row)
可以看到 multi 字段还是 false。让我们再开一个会话也针对同一行获取行锁:
postgres=# begin;
BEGIN
postgres=*# select pg_backend_pid(),txid_current();
pg_backend_pid | txid_current
----------------+--------------
2694 | 123899
(1 row)
postgres=*# select id from test_lock where id = 1 for key share;
id
----
1
(1 row)
这次可以看到就有了 MultiXactID,我们可以使用 pg_get_multixact_members 获取其成员信息以及锁的模式:
postgres=# select * from test_lock as t,pgrowlocks('test_lock') as lc where t.ctid = lc.locked_row;
id | locked_row | locker | multi | xids | modes | pids
----+------------+--------+-------+-----------------+---------------------------+-------------
1 | (0,1) | 1 | t | {123898,123899} | {"Key Share","Key Share"} | {3325,2694}
(1 row)
postgres=# select * from pg_get_multixact_members('1');
xid | mode
--------+-------
123898 | keysh
123899 | keysh
(2 rows)
infomask 也标识了第一行是 XMAX_IS_MULTI
,并且 xmax 字段是 1,远小于正常的事务 ID,说明这是一个 MultiXactID。
postgres=# select lp, t_xmin, t_xmax, t_ctid,
infomask(t_infomask, 1) as infomask,
infomask(t_infomask2, 2) as infomask2
from heap_page_items(get_raw_page('test_lock', 0));
lp | t_xmin | t_xmax | t_ctid | infomask | infomask2
----+--------+--------+--------+--------------------------------------------------------------+-----------
1 | 123897 | 1 | (0,1) | XMAX_IS_MULTI|XMIN_COMMITTED|XMAX_LOCK_ONLY|XMAX_KEYSHR_LOCK |
2 | 123897 | 0 | (0,2) | XMAX_INVALID|XMIN_COMMITTED |
(2 rows)
当然,根据冲突矩阵,不仅仅是 FOR SHARE,NO KEY UPDATE 和 KEY SHARE 同样互不冲突,所以也会产生一样的结果
Mode | FOR KEY SHARE | FOR SHARE | FOR NO KEY UPDATE | FOR UPDATE |
---|---|---|---|---|
FOR KEY SHARE | Х | |||
FOR SHARE | Х | Х | ||
FOR NO KEY UPDATE | Х | Х | Х | |
FOR UPDATE | Х | Х | Х | Х |
但是有一个例外!当有了子事务之后,也会产生 MultiXactID,也就是文章开头的例子!验证一下:
postgres=# create table test_lock(id int);
CREATE TABLE
postgres=# insert into test_lock values(1);
INSERT 0 1
postgres=# begin;
BEGIN
postgres=*# select id from test_lock where id = 1 for update;
id
----
1
(1 row)
postgres=*# savepoint a;
SAVEPOINT
此时还没有 MultiXactID
postgres=# select * from test_lock as t,pgrowlocks('test_lock') as lc where t.ctid = lc.locked_row;
id | locked_row | locker | multi | xids | modes | pids
----+------------+--------+-------+----------+----------------+--------
1 | (0,1) | 123904 | f | {123904} | {"For Update"} | {2694}
(1 row)
postgres=# select lp, t_xmin, t_xmax, t_ctid,
infomask(t_infomask, 1) as infomask,
infomask(t_infomask2, 2) as infomask2
from heap_page_items(get_raw_page('test_lock', 0));
lp | t_xmin | t_xmax | t_ctid | infomask | infomask2
----+--------+--------+--------+-----------------------------------------+--------------------
1 | 123903 | 123904 | (0,1) | XMIN_COMMITTED|XMAX_LOCK_ONLY|EXCL_LOCK | UPDATE_KEY_REVOKED
(1 row)
然后让我们更新一下我们锁住的这一行
postgres=*# savepoint a;
SAVEPOINT
postgres=*# update test_lock set id = 99 where id = 1;
UPDATE 1
DBA 惊呆了!出现了 MultiXactID!
postgres=# select * from test_lock as t,pgrowlocks('test_lock') as lc where t.ctid = lc.locked_row;
id | locked_row | locker | multi | xids | modes | pids
----+------------+--------+-------+-----------------+--------------------------------+----------
1 | (0,1) | 2 | t | {123904,123905} | {"For Update","No Key Update"} | {2694,0}
(1 row)
postgres=# select lp, t_xmin, t_xmax, t_ctid,
infomask(t_infomask, 1) as infomask,
infomask(t_infomask2, 2) as infomask2
from heap_page_items(get_raw_page('test_lock', 0));
lp | t_xmin | t_xmax | t_ctid | infomask | infomask2
----+--------+--------+--------+-----------------------------------------+--------------------------------
1 | 123903 | 2 | (0,2) | XMAX_IS_MULTI|XMIN_COMMITTED|EXCL_LOCK | UPDATE_KEY_REVOKED|HOT_UPDATED
2 | 123905 | 123904 | (0,2) | UPDATED|XMAX_LOCK_ONLY|XMAX_KEYSHR_LOCK | HEAP_ONLY_TUPLE
(2 rows)
这个情况只在 README 里面一笔带过
“In earlier PostgreSQL releases, a MultiXact always meant that the tuple was locked in shared mode by multiple transactions. This is no longer the case; a MultiXact may contain an update or delete Xid. (Keep in mind that tuple locks in a transaction do not conflict with other tuple locks in the same transaction, so it's possible to have otherwise conflicting locks in a MultiXact if they belong to the same transaction).
在早期的 PostgreSQL 版本中,MultiXact 始终意味着元组被多个事务以共享模式锁定。这已不再是这种情况; MultiXact 可能包含更新或删除的 Xid。(请记住,事务中的元组锁不会与同一事务中的其他元组锁冲突,因此如果它们属于同一事务,则 MultiXact 中可能会存在其他冲突的锁) 。
好吧,Holy shit,记住这个坑。
锁
以上分析完了 MultiXactID 机制,再分析一下它的内存结构。由于 MultiXactID 是所有后端进程共享的,需要分配和管理,所以引入了 MultiXactStateData 数据结构,这个数据结构受到 MultiXactGenLock 锁的保护,另外两个 SLRU 受到 MultiXactOffsetSLRULock 和 MultiXactMemberSLRULock 轻量锁的保护。
/*
* MultiXact state shared across all backends. All this state is protected
* by MultiXactGenLock. (We also use MultiXactOffsetSLRULock and
* MultiXactMemberSLRULock to guard accesses to the two sets of SLRU
* buffers. For concurrency's sake, we avoid holding more than one of these
* locks at a time.)
*/
typedef struct MultiXactStateData
{
/* next-to-be-assigned MultiXactId */
MultiXactId nextMXact;
/* next-to-be-assigned offset */
MultiXactOffset nextOffset;
...
由于 MultiXactID 是串行顺序分配的,并且假如新事务在共享模式下需要获取同一行的行锁,则必须创建一个新的 MultiXactID
postgres=# select * from test_lock as t,pgrowlocks('test_lock') as lc where t.ctid = lc.locked_row;
id | locked_row | locker | multi | xids | modes | pids
----+------------+--------+-------+-----------------+---------------------------+-------------
1 | (0,1) | 3 | t | {123909,123910} | {"Key Share","Key Share"} | {2694,3325}
(1 row)
postgres=# select * from test_lock as t,pgrowlocks('test_lock') as lc where t.ctid = lc.locked_row;
id | locked_row | locker | multi | xids | modes | pids
----+------------+--------+-------+------------------------+---------------------------------------+------------------
1 | (0,1) | 4 | t | {123909,123910,123911} | {"Key Share","Key Share","Key Share"} | {2694,3325,3727}
(1 row)
postgres=# select * from test_lock as t,pgrowlocks('test_lock') as lc where t.ctid = lc.locked_row;
id | locked_row | locker | multi | xids | modes | pids
----+------------+--------+-------+-------------------------------+---------------------------------------------------+-----------------------
1 | (0,1) | 5 | t | {123909,123910,123911,123912} | {"Key Share","Key Share","Key Share","Key Share"} | {2694,3325,3727,3773}
(1 row)
postgres=# select * from pg_control_checkpoint();
-[ RECORD 1 ]--------+-------------------------
checkpoint_lsn | 5/20223898
redo_lsn | 5/202160F0
redo_wal_file | 000000010000000500000020
timeline_id | 1
prev_timeline_id | 1
full_page_writes | t
next_xid | 0:123913
next_oid | 33597
next_multixact_id | 6 ---下一个获取的MultiXactID
next_multi_offset | 14 ---偏移量
oldest_xid | 716
oldest_xid_dbid | 1
oldest_active_xid | 123909
oldest_multi_xid | 1
oldest_multi_dbid | 1
oldest_commit_ts_xid | 0
newest_commit_ts_xid | 0
checkpoint_time | 2023-08-17 17:12:36+08
所以不难想象,多个进程假如要获取同一行的行锁,会导致严重的锁竞争。
另外 MultiXactID 也是 32 位的,所以需要和普通事务 ID 一样,进行冻结。
/* MultiXactId must be equivalent to TransactionId, to fit in t_xmax */
typedef TransactionId MultiXactId;
小结
总之,在 PostgreSQL 里面格外小心子事务。下一篇让我们趁热打铁再聊一聊外键导致的坑。
PS:后台回复"加群",3️⃣个学徒交流群等你来撩~
参考
Rare but extremely challenging PostgreSQL Performance Problems
https://www.postgresql.org/docs/current/routine-vacuuming.html
https://github.com/postgres/postgres/blob/REL_13_2/src/backend/access/heap/README.tuplock