应用示例荟萃 | performance_schema全方位介绍(中)
作者 罗小波·沃趣科技高级数据库技术专家
出品 沃趣科技
2.3 找出谁持有表级锁
表级锁对应的instruments(wait/lock/table/sql/handler)默认启用,对应的consumers表为performance_schema.table_handles在setup_consumers只受全局配置项global_instrumentation控制,默认启用。所以,默认情况下只需要设置系统配置参数performance_schema=ON即可,下面我们通过一个示例演示如何找出谁持有表级锁
首先,开启两个会话,第一个会话对一个表(innodb引擎)执行显式加表级锁,第二个会话对该表执行DML语句操作
# 会话1加表级锁
root@localhost : sbtest 02:15:17> use sbtest
Database changed
root@localhost : sbtest 02:40:27> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
| 18 |
+-----------------+
1 row in set (0.00 sec)
root@localhost : sbtest 02:40:29> lock table sbtest1 read;
Query OK, 0 rows affected (0.00 sec)
# 会话2对该表执行update更新
root@localhost : sbtest 10:26:37> use sbtest
Database changed
root@localhost : sbtest 02:15:33> select connection_id();
+-----------------+
| connection_id() |
+-----------------+
| 19 |
+-----------------+
1 row in set (0.00 sec)
root@localhost : sbtest 02:40:34> update sbtest1 set pad='xxx' where id=1; # 被阻塞
然后,开启第三个会话,使用show processlist语句查询线程信息,可以发现update语句正在等待MDL锁(Waiting for table metadata lock)
root@localhost : (none) 02:40:14> show processlist;
+----+-------+---------------------+--------+-------------+-------+---------------------------------------------------------------+-----------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------+---------------------+--------+-------------+-------+---------------------------------------------------------------+-----------------------------------------+
| 3 | qfsys | 192.168.2.168:41042 | NULL | Binlog Dump | 18565 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 18 | root | localhost | sbtest | Sleep | 67 | | NULL |
| 19 | root | localhost | sbtest | Query | 51 | Waiting for table metadata lock | update sbtest1 set pad='xxx' where id=1 |
| 20 | root | localhost | NULL | Query | 0 | starting | show processlist |
+----+-------+---------------------+--------+-------------+-------+---------------------------------------------------------------+-----------------------------------------+
4 rows in set (0.00 sec)
如上所示,既然是等待MDL锁,那么我们在会话3查询performance_schema.metadata_locks表,记录的顺序代表持有锁的时间顺序,如下:
root@localhost : (none) 02:41:41> select * from performance_schema.metadata_locks where OWNER_THREAD_ID!=sys.ps_thread_id(connection_id())\G;
*************************** 1. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: sbtest
OBJECT_NAME: sbtest1
OBJECT_INSTANCE_BEGIN: 140622530920576
LOCK_TYPE: SHARED_READ_ONLY
LOCK_DURATION: TRANSACTION
LOCK_STATUS: GRANTED
SOURCE: sql_parse.cc:5996
OWNER_THREAD_ID: 113 #内部ID为 113 的线程被授予了LOCK_TYPE: SHARED_READ_ONLY,持有该所的线程不允许其他线程修改sbtest1表的数据
OWNER_EVENT_ID: 11
*************************** 2. row ***************************
OBJECT_TYPE: GLOBAL
OBJECT_SCHEMA: NULL
OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140620517607728
LOCK_TYPE: INTENTION_EXCLUSIVE
LOCK_DURATION: STATEMENT
LOCK_STATUS: GRANTED
SOURCE: sql_base.cc:3190
OWNER_THREAD_ID: 114 # 内部ID为114的线程被授予了LOCK_TYPE: INTENTION_EXCLUSIVE,但这只是个意向锁
OWNER_EVENT_ID: 12
*************************** 3. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: sbtest
OBJECT_NAME: sbtest1
OBJECT_INSTANCE_BEGIN: 140620517607824
LOCK_TYPE: SHARED_WRITE
LOCK_DURATION: TRANSACTION
LOCK_STATUS: PENDING
SOURCE: sql_parse.cc:5996
OWNER_THREAD_ID: 114 # 内部ID为114的线程正在等待LOCK_TYPE: SHARED_WRITE被授予
OWNER_EVENT_ID: 12
3 rows in set (0.00 sec)
排查陷入僵局,我们知道MDL锁非常常见,对表的绝大部分的操作都会先对表加MDL锁(根据performance_schema.metadata_locks表中记录的锁信息也不顶用了),通常看到这些信息时,我们可能会立马联想到,需要去查询一下information_schema下的三张关于Innodb引擎的锁和事务信息表(INNODB_LOCK_WAITS、INNODB_LOCKS、INNODB_TRX),我们尝试着查看一下这三张表(会话3执行),可以发现都没有记录
root@localhost : (none) 02:41:53> select * from information_schema.INNODB_TRX;
Empty set (0.00 sec)
root@localhost : (none) 02:42:58> select * from information_schema.INNODB_LOCKS;
Empty set, 1 warning (0.00 sec)
root@localhost : (none) 02:43:02> select * from information_schema.INNODB_LOCK_WAITS;
Empty set, 1 warning (0.00 sec)
当然,可能有的人会说,就4个线程,第二个会话的"Command "为sleep,应该可能是它,把它kill试试看。是的,在该案例中确实可以做这个尝试,但如果是生产环境中有数十上百个正常的长连接处于sleep状态呢?这个时候我们就不能挨个去尝试了,这个时候我们可以尝试着去查询一些表级别的锁信息(通过会话3查询performance_schema.table_handles表),如下
root@localhost : (none) 02:43:06> select * from performance_schema.table_handles where OWNER_THREAD_ID!=0\G;
*************************** 1. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: sbtest
OBJECT_NAME: sbtest1
OBJECT_INSTANCE_BEGIN: 140622530923216
OWNER_THREAD_ID: 113
OWNER_EVENT_ID: 11
INTERNAL_LOCK: NULL
EXTERNAL_LOCK: READ EXTERNAL # 发现内部ID为113的线程持有了sbtest1表的READ EXTERNAL表级锁,这个也是为什么内部ID为114的线程无法获取到MDL写锁的原因
1 row in set (0.00 sec)
通过上述查询到的相关数据,113线程对sbtest1表显式加了表级读锁,而且长时间处于sleep状态,但我们并不知道该线程正在执行什么SQL语句,我们可以通过performance_schema.events_statements_current表查询,如下
root@localhost : (none) 02:43:22> select * from performance_schema.events_statements_current where thread_id=113\G;
*************************** 1. row ***************************
THREAD_ID: 113
EVENT_ID: 10
END_EVENT_ID: 10
EVENT_NAME: statement/sql/lock_tables
SOURCE: socket_connection.cc:101
TIMER_START: 18503556405463000
TIMER_END: 18503556716572000
TIMER_WAIT: 311109000
LOCK_TIME: 293000000
SQL_TEXT: lock table sbtest1 read # 这里可以看到,内部ID为113的线程对表sbtest1执行了加读锁语句
DIGEST: 9f987e807ca36e706e33275283b5572b
DIGEST_TEXT: LOCK TABLE `sbtest1` READ
CURRENT_SCHEMA: sbtest
......
1 row in set (0.00 sec)
从performance_schema.events_statements_current 表的查询信息中,通过SQL_TEXT字段我们可以清晰地看到该线程正在执行的SQL语句是什么。如果是生产环境,现在,你可以去找相关的开发人员确认,如果没有什么特殊操作,就可以尝试着kill掉这个线程(会话3执行,processlist_id为18),同时针对这个问题进行优化,避免下次再发生类似的问题
# 如何知道内部ID 113对应的process id是多少呢?我们可以通过performance_schema.threads表查询
root@localhost : (none) 02:48:19> select processlist_id from performance_schema.threads where thread_id=113;
+----------------+
| processlist_id |
+----------------+
| 18 |
+----------------+
1 row in set (0.00 sec)
# 执行kill
root@localhost : (none) 02:48:24> kill 18;
Query OK, 0 rows affected (0.00 sec)
root@localhost : (none) 02:48:40> show processlist;
+----+-------+---------------------+--------+-------------+-------+---------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------+---------------------+--------+-------------+-------+---------------------------------------------------------------+------------------+
| 3 | qfsys | 192.168.2.168:41042 | NULL | Binlog Dump | 18994 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 19 | root | localhost | sbtest | Sleep | 480 | | NULL |
| 20 | root | localhost | NULL | Query | 0 | starting | show processlist |
+----+-------+---------------------+--------+-------------+-------+---------------------------------------------------------------+------------------+
3 rows in set (0.00 sec)
# 返回执行update语句的会话2,语句已经执行成功
root@localhost : sbtest 02:40:34> update sbtest1 set pad='xxx' where id=1;
Query OK, 0 rows affected (7 min 50.23 sec)
Rows matched: 0 Changed: 0 Warnings: 0
2.4 找出谁持有行级锁
该案例中涉及的performance_schema.data_lock表在MySQL 8.0中新增,在8.0之前的版本中不支持,在这里仅作为针对MySQL 5.7的performance_schema的一个延伸学习
如果一个事务长时间未提交,我们虽然可以从information_schema.innodb_trx、performance_schema.events_transactions_current等表中查询到相应的事务信息,但却无从知道这个事务持有了哪些锁。虽然information_schema.innodb_locks表是用于记录事务锁信息的,但需要在两个不同的事务发生锁等待时该表才会记录两个事务的锁信息。从8.0开始,在performance_schema中提供了一个data_locks表用于记录任意事务的锁信息(同时废弃了information_schema.innodb_locks表),不需要有锁等待关系存在(注意,该表中只记录innodb存储引擎层的锁),如下
首先,我们在8.0中打开一个会话(会话1),显式开启一个事务
root@localhost : xiaoboluo 01:26:09> use xiaoboluo
Database changed
root@localhost : xiaoboluo 01:26:19> select * from t_luoxiaobo limit 1;
+----+------+---------------------+
| id | test | datet_time |
+----+------+---------------------+
| 2 | 1 | 2017-09-06 01:11:59 |
+----+------+---------------------+
1 row in set (0.00 sec)
root@localhost : xiaoboluo 01:26:21> begin;
Query OK, 0 rows affected (0.00 sec)
root@localhost : xiaoboluo 01:26:23> update t_luoxiaobo set datet_time=now() where id=2;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
打开另外一个会话(会话2)查询data_locks表
root@localhost : performance_schema 01:27:21> select * from data_locks\G;
*************************** 1. row ***************************
ENGINE: INNODB
ENGINE_LOCK_ID: 55562:62
ENGINE_TRANSACTION_ID: 55562
THREAD_ID: 54 # 持有线程内部ID
EVENT_ID: 85
OBJECT_SCHEMA: xiaoboluo # 库名
OBJECT_NAME: t_luoxiaobo # 表名
PARTITION_NAME: NULL
SUBPARTITION_NAME: NULL
INDEX_NAME: NULL # 索引名称
OBJECT_INSTANCE_BEGIN: 140439793477144
LOCK_TYPE: TABLE # 表级锁
LOCK_MODE: IX # IX锁
LOCK_STATUS: GRANTED # 被授予状态
LOCK_DATA: NULL
*************************** 2. row ***************************
ENGINE: INNODB
ENGINE_LOCK_ID: 55562:2:4:2
ENGINE_TRANSACTION_ID: 55562
THREAD_ID: 54 # 持有锁线程内部ID
EVENT_ID: 85
OBJECT_SCHEMA: xiaoboluo # 库名
OBJECT_NAME: t_luoxiaobo #表名
PARTITION_NAME: NULL
SUBPARTITION_NAME: NULL
INDEX_NAME: PRIMARY # 索引为主键
OBJECT_INSTANCE_BEGIN: 140439793474104
LOCK_TYPE: RECORD # 记录锁
LOCK_MODE: X # 排它锁
LOCK_STATUS: GRANTED # 被授予状态
LOCK_DATA: 2 # 被锁定的数据记录,这里的记录对应的是 INDEX_NAME: PRIMARY 的value
2 rows in set (0.00 sec)
从查询结果中我们可以看到,有两行锁记录,第一行是对表t_luoxiaobo的IX锁,状态为GRANTED,第二行为使用主键索引的X锁记录锁,状态为GRANTED
现在,我们模拟两个DML发生锁等待的场景。我们新开一个会话(会话3),在会话1中的事务未提交的情况下,会话3对表t_luoxiaobo执行同样的操作
root@localhost : (none) 09:34:49> use xiaoboluo
Database changed
root@localhost : xiaoboluo 09:34:54> begin;
Query OK, 0 rows affected (0.00 sec)
root@localhost : xiaoboluo 09:34:55> update t_luoxiaobo set datet_time=now() where id=2; # 被阻塞
回到会话2中查询data_locks表,可以发现有4行锁记录了
root@localhost : (none) 09:35:18> select * from performance_schema.data_locks\G;
*************************** 1. row ***************************
......
THREAD_ID: 55
......
LOCK_TYPE: TABLE
LOCK_MODE: IX
LOCK_STATUS: GRANTED
LOCK_DATA: NULL
*************************** 2. row ***************************
ENGINE: INNODB
ENGINE_LOCK_ID: 55563:2:4:2
ENGINE_TRANSACTION_ID: 55563
THREAD_ID: 55 # 内部线程ID
EVENT_ID: 8
OBJECT_SCHEMA: xiaoboluo
OBJECT_NAME: t_luoxiaobo
PARTITION_NAME: NULL
SUBPARTITION_NAME: NULL
INDEX_NAME: PRIMARY # 发生锁的索引名称
OBJECT_INSTANCE_BEGIN: 140439793480168
LOCK_TYPE: RECORD # 记录锁
LOCK_MODE: X # 排它锁
LOCK_STATUS: WAITING # 正在等待锁被授予
LOCK_DATA: 2 # 锁定的索引value,这里与内部ID为54线程持有的主键值为2的X锁完全一样,说明这里就是被内部ID为54线程阻塞了
*************************** 3. row ***************************
......
THREAD_ID: 54
.......
LOCK_TYPE: TABLE
LOCK_MODE: IX
LOCK_STATUS: GRANTED
LOCK_DATA: NULL
*************************** 4. row ***************************
......
THREAD_ID: 54
EVENT_ID: 85
OBJECT_SCHEMA: xiaoboluo
OBJECT_NAME: t_luoxiaobo
PARTITION_NAME: NULL
SUBPARTITION_NAME: NULL
INDEX_NAME: PRIMARY
OBJECT_INSTANCE_BEGIN: 140439793474104
LOCK_TYPE: RECORD
LOCK_MODE: X
LOCK_STATUS: GRANTED
LOCK_DATA: 2
4 rows in set (0.00 sec)
从上面的查询数据可以看到,performance_schema.data_locks表中新增了线程ID为55的两个锁记录,IX锁状态为GRANTED,X锁状态为WAITING,说明正在等待被授予。但这里并不能很直观地查看到锁等待关系,我们可以使用sys.innodb_lock_waits视图查看
root@localhost : (none) 09:44:52> select * from sys.innodb_lock_waits\G;
*************************** 1. row ***************************
wait_started: 2018-01-14 21:51:59
wait_age: 00:00:11
wait_age_secs: 11
locked_table: `xiaoboluo`.`t_luoxiaobo`
locked_table_schema: xiaoboluo
locked_table_name: t_luoxiaobo
locked_table_partition: NULL
locked_table_subpartition: NULL
locked_index: PRIMARY
locked_type: RECORD
waiting_trx_id: 55566
waiting_trx_started: 2018-01-14 21:51:59
waiting_trx_age: 00:00:11
waiting_trx_rows_locked: 1
waiting_trx_rows_modified: 0
waiting_pid: 8
waiting_query: update t_luoxiaobo set datet_time=now() where id=2
waiting_lock_id: 55566:2:4:2
waiting_lock_mode: X
blocking_trx_id: 55562
blocking_pid: 7
blocking_query: NULL
blocking_lock_id: 55562:2:4:2
blocking_lock_mode: X
blocking_trx_started: 2018-01-14 21:34:44
blocking_trx_age: 00:17:26
blocking_trx_rows_locked: 1
blocking_trx_rows_modified: 1
sql_kill_blocking_query: KILL QUERY 7
sql_kill_blocking_connection: KILL 7
1 row in set (0.02 sec)
PS:在MySQL 5.7版本中,也可以使用sys.innodb_lock_waits视图查询,但是在8.0中,该视图联结查询的表不同(把之前版本中使用的information_schema.innodb_locks和information_schema.innodb_lock_waits表替换为了performance_schema.data_locks和performance_schema.data_lock_waits表),另外,在MySQL 5.6及其之前的版本中默认情况下并没有sys库,我们可以使用如下语句代替:
SELECT r.trx_wait_started AS wait_started,
TIMEDIFF(NOW(), r.trx_wait_started) AS wait_age,
TIMESTAMPDIFF(SECOND, r.trx_wait_started, NOW()) AS wait_age_secs,
rl.lock_table AS locked_table,
rl.lock_index AS locked_index,
rl.lock_type AS locked_type,
r.trx_id AS waiting_trx_id,
r.trx_started as waiting_trx_started,
TIMEDIFF(NOW(), r.trx_started) AS waiting_trx_age,
r.trx_rows_locked AS waiting_trx_rows_locked,
r.trx_rows_modified AS waiting_trx_rows_modified,
r.trx_mysql_thread_id AS waiting_pid,
sys.format_statement(r.trx_query) AS waiting_query,
rl.lock_id AS waiting_lock_id,
rl.lock_mode AS waiting_lock_mode,
b.trx_id AS blocking_trx_id,
b.trx_mysql_thread_id AS blocking_pid,
sys.format_statement(b.trx_query) AS blocking_query,
bl.lock_id AS blocking_lock_id,
bl.lock_mode AS blocking_lock_mode,
b.trx_started AS blocking_trx_started,
TIMEDIFF(NOW(), b.trx_started) AS blocking_trx_age,
b.trx_rows_locked AS blocking_trx_rows_locked,
b.trx_rows_modified AS blocking_trx_rows_modified,
CONCAT('KILL QUERY ', b.trx_mysql_thread_id) AS sql_kill_blocking_query,
CONCAT('KILL ', b.trx_mysql_thread_id) AS sql_kill_blocking_connection
FROM information_schema.innodb_lock_waits w
INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
INNER JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id
INNER JOIN information_schema.innodb_locks bl ON bl.lock_id = w.blocking_lock_id
INNER JOIN information_schema.innodb_locks rl ON rl.lock_id = w.requested_lock_id
ORDER BY r.trx_wait_started;
限于篇幅,本文将分期进行推送,下一篇"应用示例荟萃 | performance_schema全方位介绍(下)",精彩内容不容错过!
| 作者简介
罗小波·沃趣科技高级数据库技术专家
IT从业多年,历任运维工程师,高级运维工程师,运维经理,数据库工程师,曾参与版本发布系统,轻量级监控系统,运维管理平台,数据库管理平台的设计与编写,熟悉MySQL的体系结构时,InnoDB存储引擎,喜好专研开源技术,追求完美。
相关链接
MySQL InnoDB Update和Crash Recovery流程
Oracle 12c系列(八)|RMAN (FROM SERVICE)
更多干货,欢迎来撩~