MySQL报障之coredump收集处理流程
1.配置coredump
1.1core-file
1.2coredumper
2.现场初步分析
2.1分析error log
2.2调试core文件
3.信息搜集
4.参考文档
1.配置coredump
core文件是在程序出现异常退出时,保留的异常堆栈文件,能够帮助开发人员快速定位问题,故对于线上的程序,需要开启coredump功能。可以通过配置core-file或者coredumper开启coredump功能。
1.1core-file
我们需要在my.cnf中的[mysqld]里面加上如下配置。
[mysqld]
core-file
innodb_buffer_pool_in_core_file=OFF
默认情况下,生成的core文件会出现在对应的datadir目录。
启动mysqld_safe之前,操作系统也需要做一些配置:
echo "core.%p.%e.%s" > /proc/sys/kernel/core_pattern
ulimit -c unlimited
sudo sysctl -w fs.suid_dumpable=2
1.2coredumper
如果没有root权限,无法进行操作系统全局更改时,可以使用如下配置进行配置。
[mysqld]
coredumper
innodb_buffer_pool_in_core_file=OFF
启动mysqld_safe之前,操作系统也需要做一些配置:
ulimit -c unlimited
这样只会影响mysql用户。
core文件会产生在datadir目录。
2.现场初步分析
2.1分析error log
首先需要在error log中,搜索backtrace
关键字查找异常堆栈。
67912 Build ID: 0eaf4b944b1dbc99a26f9343f301e033bdedeb1d
67913 Server Version: 8.0.25-15-mysqlcluster5.0.7-GA MySQL Cluster, Release GA, Revision b43d2b2e462
67914
67915 Thread pointer: 0x7f914ec27000
67916 Attempting backtrace. You can use the following information to find out
67917 where mysqld died. If you see no messages after this, something went
67918 terribly wrong...
67919 stack_bottom = 7f91608d8c30 thread_stack 0x46000
67920 /mysql/svr/mysql/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x200c33d]
67921 /mysql/svr/mysql/bin/mysqld(handle_fatal_signal+0x37b) [0x1169e4b]
67922 /lib64/libpthread.so.0(+0xf5d0) [0x7f91d95c55d0]
67923 /mysql/svr/mysql/lib/plugin/ha_mysql.so(mysql::ha_mysql::build_order_list_string(String&)+0x6c) [0x7f915951aa4c]
67924 /mysql/svr/mysql/lib/plugin/ha_mysql.so(mysql::ha_greatpart::rnd_init_low(bool)+0x1ac) [0x7f915959765c]
67925 /mysql/svr/mysql/bin/mysqld(handler::ha_rnd_init(bool)+0x22) [0xcc6102]
67926 /mysql/svr/mysql/bin/mysqld(TableScanIterator::Init()+0x4e) [0xeed66e]
67927 /mysql/svr/mysql/bin/mysqld(LimitOffsetIterator::Init()+0x16) [0x122afe6]
67928 /mysql/svr/mysql/bin/mysqld(Query_expression::ExecuteIteratorQuery(THD*)+0x281) [0x10a95a1]
67929 /mysql/svr/mysql/bin/mysqld(Query_expression::execute(THD*)+0x2f) [0x10a994f]
67930 /mysql/svr/mysql/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x516) [0x1029376]
67931 /mysql/svr/mysql/bin/mysqld(mysql_execute_command(THD*, bool)+0x9e0) [0xfcce70]
67932 /mysql/svr/mysql/bin/mysqld(dispatch_sql_command(THD*, Parser_state*, bool)+0x4f1) [0xfd0961]
67933 /mysql/svr/mysql/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x1843) [0xfd26b3]
67934 /mysql/svr/mysql/bin/mysqld(do_command(THD*)+0x210) [0xfd3c70]
67935 /mysql/svr/mysql/bin/mysqld() [0x115ac90]
67936 /mysql/svr/mysql/bin/mysqld() [0x24cf38e]
67937 /lib64/libpthread.so.0(+0x7dd5) [0x7f91d95bddd5]
67938 /lib64/libc.so.6(clone+0x6d) [0x7f91d796dead]
67939
67940 Trying to get some variables.
67941 Some pointers may be invalid and cause the dump to abort.
67942 Query (7f913ef65028): select * from `grid-report`.`dp_bi_main_order_acccum` limit 0, 1000
67943 Connection ID (thread ID): 53
67944 Status: NOT_KILLED
可以从这里看到如下几个信息:
Query (7f913ef65028): select * from grid-report
.dp_bi_main_order_acccum
limit 0, 1000语句导致的堆栈异常最后core在了build_order_list_string函数,且函数内的偏移是0x6c Server Version: 8.0.25-15-mysqlcluster5.0.7-GA MySQL Cluster, Release GA, Revision b43d2b2e462, 这里是commit版本号,我们需要使用发布对应这个版本号的二进制。
2.2调试core文件
# gdb /mysqld所在目录/mysqld /core文件所在目录/corefile
// 出现gdb命令行,敲入bt命令查看堆栈
(gdb) bt
#0 0x00007f6cd73619d1 in pthread_kill () from /lib64/libpthread.so.0
#1 0x0000000001169e7d in handle_fatal_signal (sig=11) at /builds/mysql-cluster/myrocks/sql/signal_handler.cc:194
#2 <signal handler called>
#3 operator() (__ptr=0x7f6c3636c71a, this=0x7f6c0e1b1f2c) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/unique_ptr.h:79
#4 ~unique_ptr (this=0x7f6c0e1b1f2c, __in_chrg=<optimized out>) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/unique_ptr.h:361
#5 ~Gdb_execute_plan (this=0x7f6c0e1b1f2c, __in_chrg=<optimized out>) at /builds/mysql-cluster/myrocks/storage/mysql/gdb_execute_plan.h:69
#6 operator() (this=0x7f6c3635c548, __ptr=0x7f6c0e1b1f2c) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/unique_ptr.h:85
#7 operator() (__ptr=0x7f6c0e1b1f2c, this=0x7f6c3635c548) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/unique_ptr.h:79
#8 ~unique_ptr (this=0x7f6c3635c548, __in_chrg=<optimized out>) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/unique_ptr.h:361
#9 mysql::execute_direct_query(std::unique_ptr<mysql::Gdb_execute_node, std::default_delete<mysql::Gdb_execute_node> >&, mysql::Gdb_direct_exec_parameters*, unsigned long, bool) () at /builds/mysql-cluster/myrocks/storage/mysql/gdb_query.cc:1669
#10 0x00007f6c0e19b65c in mem_free (this=<optimized out>) at /builds/mysql-cluster/myrocks/include/sql_string.h:382
#11 ~String (this=<optimized out>, __in_chrg=<optimized out>) at /builds/mysql-cluster/myrocks/include/sql_string.h:235
#12 mysql::ha_greatpart::index_read_map_low(unsigned char const*, unsigned long, ha_rkey_function, unsigned int) ()
at /builds/mysql-cluster/myrocks/storage/mysql/ha_greatpart.cc:881
#13 0x0000000000cc6102 in handler::ha_rnd_init (this=0x7f6c720fb028, scan=<optimized out>) at /builds/mysql-cluster/myrocks/sql/handler.cc:3141
#14 0x0000000000eed66e in TableScanIterator::Init (this=0x7f6b7f486c50) at /builds/mysql-cluster/myrocks/sql/row_iterator.h:213
#15 0x000000000122afe6 in LimitOffsetIterator::Init (this=0x7f6b7f486c88)
at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/unique_ptr.h:421
#16 0x00000000010a95a1 in Query_expression::ExecuteIteratorQuery(THD*) ()
at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/unique_ptr.h:421
#17 0x00000000010a994f in Query_expression::execute(THD*) () at /builds/mysql-cluster/myrocks/sql/sql_union.cc:1310
#18 0x0000000001029376 in Sql_cmd_dml::execute(THD*) () at /builds/mysql-cluster/myrocks/sql/sql_select.cc:575
#19 0x0000000000fcce70 in mysql_execute_command(THD*, bool) () at /builds/mysql-cluster/myrocks/sql/sql_parse.cc:4725
#20 0x0000000000fd0961 in dispatch_sql_command (thd=thd@entry=0x7f6beb676000, parser_state=parser_state@entry=0x7f6c3636daa0,
update_userstat=update_userstat@entry=false) at /builds/mysql-cluster/myrocks/sql/sql_parse.cc:5321
#21 0x0000000000fd26b3 in dispatch_command(THD*, COM_DATA const*, enum_server_command) () at /builds/mysql-cluster/myrocks/sql/sql_parse.cc:1969
#22 0x0000000000fd3c70 in do_command (thd=0x7f6beb676000) at /builds/mysql-cluster/myrocks/sql/sql_parse.cc:1417
#23 0x000000000115ac90 in handle_connection (arg=arg@entry=0x7f6bed741340)
at /builds/mysql-cluster/myrocks/sql/conn_handler/connection_handler_per_thread.cc:307
#24 0x00000000024cf38e in pfs_spawn_thread (arg=0x7f6bed4b63e0) at /builds/mysql-cluster/myrocks/storage/perfschema/pfs.cc:2899
#25 0x00007f6cd735cdd5 in start_thread () from /lib64/libpthread.so.0
#26 0x00007f6cd570cead in clone () from /lib64/libc.so.6
此信息需要截图。可以联系研发协助打印一些变量信息。
3.信息搜集
经过上面的初步分析,需要搜集如下文件提交给研发人员进一步分析。
error log general log core文件,如果研发之前已经搜集到了,可以不用传出来。 导致core的query涉及到的表的建表语句以及数据(数据需要看研发是否需要)。
如果general log很大,需要DBA截取core之前1小时或者10万行的日志即可,可以通过grep、head、tail等命令截取相关日志。
通过error日志确定core的时间点 通过grep在general log中定位到具体时间点的行号 x 通过head -n x general.log | tail -n 100000 > general.log.coredump
4.参考文档
MySQL is crashing: a support engineer’s point of view(https://www.percona.com/blog/2015/08/17/mysql-is-crashing-a-support-engineers-point-of-view/)
Say Hello to Libcoredumper(https://www.percona.com/blog/2020/10/28/say-hello-to-libcoredumper-a-new-way-to-generate-core-dumps-and-other-improvements/)
Enjoy GreatSQL :)
《深入浅出MGR》视频课程
戳此小程序即可直达B站
https://www.bilibili.com/medialist/play/1363850082?business=space_collection&business_id=343928&desc=0
文章推荐:
想看更多技术好文,点个“在看”吧!