sysbench的lua小改动导致的性能差异
作者 罗小波·沃趣科技高级数据库技术专家
出品 沃趣科技
| 导语
最近在配合某同事做一项性能压测,发现相同数据量、相同数据库参数、相同sysbench压力、相同数据库版本和sysbench版本、相同服务器硬件环境下,我和同事的压测结果天差地别:一个小时压测结束后,我的压测结果中出现了高频率周期性阻塞(tps,qps为0),而同事的压测结果中未出现阻塞(tps,qps从头到尾都比较稳定)。正常情况下,在环境完全相同时,不可能会出现如此巨大的性能差异。但这次,不可能发生的事情它的确发生了。经过复测与排查,终于发现了其中的奥妙。
服务器硬件信息
数据库主机
* CPU:72 process
* memory:128G
* 磁盘:某存储 100G
* 网卡:intel 万兆网卡数据库版本:MySQL 5.7.21
sysbench版本:1.0.9
sysbench主机
* CPU:20 process
* memory:128G
* 磁盘:本地SAS 50G
* 网卡:intel 万兆网卡
| 复测结果数据
先来看看压测结果数据,如下图所示(注:此处只截取了前几十秒的数据进行对比):
我的压测结果(下图可见,明显的周期性tps,qps为0的情况)
同事的压测结果(下图可见,tps,qps较为稳定且并未出现阻塞情况)
| 抓取等待事件
看到上文第1节中的结果,如果根据以往的经验、常识来快速判断...估计会懵圈!!当得知同事的测试结果很稳定时,我本能地想:要把等待事件信息拉出来瞧瞧!!下面是我与同事各自在复测时截取的等待事件信息(复测时间为3分钟)(不想仔细看等待事件内容的亲可直接跳至第3节)
operation操作时间统计(每秒查询一次,查询数十次截取时间最长的5次)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 1.20 s | idle |
| 219.20 ms | write |
+----------------------------------+-----------+
2 rows in set (0.01 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 1.16 s | idle |
| 182.44 ms | write |
+----------------------------------+-----------+
2 rows in set (0.01 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 1.17 s | idle |
| 164.27 ms | write |
+----------------------------------+-----------+
2 rows in set (0.01 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 1.21 s | idle |
| 268.65 ms | write |
+----------------------------------+-----------+
2 rows in set (0.01 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 11.78 s | fetch |
| 1.61 s | idle |
+----------------------------------+-----------+
2 rows in set (0.02 sec)
# 同事的
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 21.17 ms | idle |
| 1.58 ms | fetch |
+----------------------------------+-----------+
2 rows in set (0.02 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 185.44 ms | idle |
+----------------------------------+-----------+
1 row in set (0.02 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 141.82 ms | lock |
| 110.90 ms | fetch |
| 3.30 ms | idle |
+----------------------------------+-----------+
3 rows in set (0.01 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 2.91 s | idle |
+----------------------------------+-----------+
1 row in set (0.00 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 1.20 s | idle |
+----------------------------------+-----------+
1 row in set (0.01 sec)
mysql> select sys.format_time(sum(timer_wait)),operation from performance_schema.events_waits_history_long where timer_wait > 1000000000 group by operation order by sum(timer_wait) desc;
+----------------------------------+-----------+
| sys.format_time(sum(timer_wait)) | operation |
+----------------------------------+-----------+
| 215.34 ms | idle |
+----------------------------------+-----------+
1 row in set (0.01 sec)
等待事件按照时间开销倒序排序,取出超过ms的部分(每秒查询一次,查询数十次截取时间最长的5次)
# 我的
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+----------------------------------------+-----------------+-----------------------------+-----------+
| event_name | source | sys.format_time(timer_wait) | operation |
+----------------------------------------+-----------------+-----------------------------+-----------+
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 37.31 ms | write |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 37.08 ms | write |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 23.95 ms | write |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 23.06 ms | write |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 20.99 ms | write |
......
+----------------------------------------+-----------------+-----------------------------+-----------+
50 rows in set (0.02 sec)
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+--------------------------------------+-----------------+-----------------------------+-----------+
| event_name | source | sys.format_time(timer_wait) | operation |
+--------------------------------------+-----------------+-----------------------------+-----------+
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 15.71 ms | write |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 14.94 ms | write |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 14.22 ms | write |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 13.93 ms | write |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 13.53 ms | write |
......
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+--------------------------------------+-----------------+-----------------------------+-----------+
| event_name | source | sys.format_time(timer_wait) | operation |
+--------------------------------------+-----------------+-----------------------------+-----------+
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 41.05 ms | write |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 34.67 ms | write |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 31.63 ms | write |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 31.30 ms | write |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 28.09 ms | write |
......
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+-------------------------------------+-----------------+-----------------------------+-----------+
| event_name | source | sys.format_time(timer_wait) | operation |
+-------------------------------------+-----------------+-----------------------------+-----------+
| wait/io/file/innodb/innodb_log_file | fil0fil.cc:5992 | 33.20 ms | sync |
| wait/io/table/sql/handler | handler.cc:3084 | 26.20 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 23.26 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 21.26 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 19.07 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3084 | 18.55 ms | fetch |
......
| wait/io/table/sql/handler | handler.cc:7992 | 12.07 ms | insert |
| wait/io/table/sql/handler | handler.cc:3084 | 11.96 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 11.31 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3084 | 11.04 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3084 | 10.69 ms | fetch |
| wait/io/table/sql/handler | handler.cc:7992 | 9.58 ms | insert |
| wait/io/table/sql/handler | handler.cc:3032 | 9.55 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 9.27 ms | fetch |
| wait/io/table/sql/handler | handler.cc:7992 | 9.10 ms | insert |
| wait/io/table/sql/handler | handler.cc:3084 | 9.01 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 7.69 ms | fetch |
......
| wait/io/table/sql/handler | handler.cc:3084 | 5.70 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 5.69 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 5.63 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3084 | 5.63 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3084 | 5.57 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3084 | 5.53 ms | fetch |
| wait/io/table/sql/handler | handler.cc:7992 | 5.52 ms | insert |
| wait/io/table/sql/handler | handler.cc:3084 | 5.51 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 5.48 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 5.43 ms | fetch |
+-------------------------------------+-----------------+-----------------------------+-----------+
50 rows in set (0.02 sec)
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+--------------------------------------+-----------------+-----------------------------+-----------+
| event_name | source | sys.format_time(timer_wait) | operation |
+--------------------------------------+-----------------+-----------------------------+-----------+
| wait/io/file/innodb/innodb_log_file | fil0fil.cc:5992 | 33.20 ms | sync |
| wait/io/table/sql/handler | handler.cc:3032 | 19.07 ms | fetch |
| wait/io/file/innodb/innodb_data_file | fil0fil.cc:5778 | 18.71 ms | write |
| wait/io/table/sql/handler | handler.cc:3032 | 14.12 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 13.16 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 13.08 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 12.56 ms | fetch |
| wait/io/file/innodb/innodb_log_file | fil0fil.cc:5992 | 12.23 ms | sync |
| wait/io/table/sql/handler | handler.cc:3032 | 12.22 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 11.31 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3084 | 11.04 ms | fetch |
......
| wait/io/table/sql/handler | handler.cc:3032 | 5.86 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 5.79 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 5.78 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3084 | 5.74 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 5.71 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 5.71 ms | fetch |
......
| wait/io/table/sql/handler | handler.cc:3032 | 3.88 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3084 | 3.88 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3084 | 3.86 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 3.85 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3084 | 3.81 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3084 | 3.81 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3032 | 3.80 ms | fetch |
+--------------------------------------+-----------------+-----------------------------+-----------+
50 rows in set (0.02 sec)
# 同事的
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+---------------------------------------------+-------------------+-----------------------------+-----------+
| event_name | source | sys.format_time(timer_wait) | operation |
+---------------------------------------------+-------------------+-----------------------------+-----------+
| wait/io/file/innodb/innodb_log_file | fil0fil.cc:5992 | 16.18 ms | sync |
| wait/io/file/innodb/innodb_log_file | fil0fil.cc:5992 | 5.34 ms | sync |
......
+---------------------------------------------+-------------------+-----------------------------+-----------+
50 rows in set (0.01 sec)
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+---------------------------------------+--------------------+-----------------------------+-----------+
| event_name | source | sys.format_time(timer_wait) | operation |
+---------------------------------------+--------------------+-----------------------------+-----------+
| wait/synch/mutex/innodb/trx_sys_mutex | ha_innodb.cc:15727 | 1.87 ms | lock |
| wait/synch/mutex/innodb/trx_sys_mutex | ha_innodb.cc:15727 | 1.64 ms | lock |
| wait/io/table/sql/handler | handler.cc:3084 | 1.63 ms | fetch |
| wait/io/table/sql/handler | handler.cc:3084 | 1.63 ms | fetch |
| wait/synch/mutex/innodb/trx_sys_mutex | read0read.cc:584 | 1.61 ms | lock |
| wait/synch/mutex/innodb/trx_sys_mutex | read0read.cc:584 | 1.57 ms | lock |
......
+---------------------------------------+--------------------+-----------------------------+-----------+
50 rows in set (0.02 sec)
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+------------------------------------------+--------------------+-----------------------------+-----------+
| event_name | source | sys.format_time(timer_wait) | operation |
+------------------------------------------+--------------------+-----------------------------+-----------+
| wait/synch/mutex/innodb/lock_mutex | lock0lock.cc:6844 | 1.27 ms | lock |
| wait/synch/mutex/innodb/lock_mutex | lock0lock.cc:6047 | 1.19 ms | lock |
| wait/io/table/sql/handler | handler.cc:3032 | 1.19 ms | fetch |
| wait/io/table/sql/handler | handler.cc:8033 | 1.13 ms | update |
| wait/io/table/sql/handler | handler.cc:3084 | 1.07 ms | fetch |
| wait/synch/mutex/innodb/trx_sys_mutex | read0read.cc:584 | 1.05 ms | lock |
| wait/io/table/sql/handler | handler.cc:8066 | 1.03 ms | delete |
......
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+---------------------------------------+-------------------+-----------------------------+-----------+
| event_name | source | sys.format_time(timer_wait) | operation |
+---------------------------------------+-------------------+-----------------------------+-----------+
| wait/io/file/innodb/innodb_log_file | fil0fil.cc:5992 | 14.18 ms | sync |
| wait/io/table/sql/handler | handler.cc:3084 | 4.72 ms | fetch |
| wait/io/file/innodb/innodb_log_file | fil0fil.cc:5992 | 4.20 ms | sync |
| wait/synch/mutex/sql/LOCK_table_cache | table_cache.h:115 | 1.68 ms | lock |
| wait/synch/mutex/sql/LOCK_table_cache | table_cache.h:115 | 1.66 ms | lock |
| wait/synch/mutex/sql/LOCK_table_cache | table_cache.h:115 | 1.56 ms | lock |
| wait/synch/mutex/sql/LOCK_table_cache | table_cache.h:115 | 1.49 ms | lock |
......
mysql> select event_name,source,sys.format_time(timer_wait),operation from performance_schema.events_waits_history_long where event_name!='idle' order by TIMER_WAIT desc limit 50;
+-------------------------------------------+------------------+-----------------------------+----------------+
| event_name | source | sys.format_time(timer_wait) | operation |
+-------------------------------------------+------------------+-----------------------------+----------------+
| wait/synch/sxlock/innodb/fil_space_latch | fsp0fsp.cc:3365 | 7.84 ms | exclusive_lock |
| wait/io/table/sql/handler | handler.cc:3032 | 1.00 ms | fetch |
......
+-------------------------------------------+------------------+-----------------------------+----------------+
50 rows in set (0.01 sec)
按照等待事件类型统计总数、平均、最大、最小等待时长(只取出前50)
# 我的
mysql> select * from sys.waits_global_by_latency limit 50;
+-----------------------------------------------------------+-----------+---------------+-------------+-------------+
| events | total | total_latency | avg_latency | max_latency |
+-----------------------------------------------------------+-----------+---------------+-------------+-------------+
| wait/io/table/sql/handler | 50405591 | 9.64 h | 688.46 us | 11.26 s |
| wait/synch/mutex/innodb/trx_sys_mutex | 6815615 | 6.61 m | 58.20 us | 25.66 ms |
| wait/io/socket/sql/client_connection | 8527800 | 1.64 m | 11.53 us | 11.44 ms |
| wait/synch/sxlock/innodb/fil_space_latch | 22637 | 1.44 m | 3.83 ms | 10.36 s |
| wait/synch/sxlock/innodb/dict_operation_lock | 90 | 1.40 m | 934.44 ms | 9.70 s |
| wait/io/file/innodb/innodb_data_file | 204837 | 1.22 m | 358.33 us | 112.80 ms |
| wait/synch/mutex/innodb/sync_array_mutex | 1863469 | 1.03 m | 33.24 us | 10.80 ms |
| wait/synch/mutex/innodb/log_sys_write_mutex | 602223 | 59.64 s | 99.03 us | 11.26 ms |
| wait/io/file/innodb/innodb_log_file | 8158 | 46.84 s | 5.74 ms | 82.44 ms |
| wait/synch/mutex/sql/LOCK_table_cache | 7534927 | 46.66 s | 6.19 us | 11.16 ms |
| wait/synch/sxlock/innodb/index_tree_rw_lock | 5966049 | 44.10 s | 7.39 us | 10.43 s |
| wait/synch/mutex/sql/THD::LOCK_thd_data | 142798237 | 22.75 s | 159.20 ns | 2.52 ms |
......
50 rows in set (0.04 sec)
# 同事的
mysql> select * from sys.waits_global_by_latency limit 50;
+-----------------------------------------------------+-----------+---------------+-------------+-------------+
| events | total | total_latency | avg_latency | max_latency |
+-----------------------------------------------------+-----------+---------------+-------------+-------------+
| wait/io/table/sql/handler | 280178382 | 32.65 m | 6.99 us | 324.56 ms |
| wait/synch/mutex/innodb/trx_sys_mutex | 24005580 | 15.87 m | 39.66 us | 36.95 ms |
| wait/io/socket/sql/client_connection | 28521546 | 5.47 m | 11.52 us | 21.09 ms |
| wait/synch/mutex/innodb/log_sys_write_mutex | 1383193 | 3.18 m | 137.81 us | 15.70 ms |
| wait/io/file/innodb/innodb_log_file | 22238 | 2.82 m | 7.60 ms | 456.05 ms |
| wait/synch/mutex/innodb/sync_array_mutex | 5195017 | 2.64 m | 30.52 us | 10.90 ms |
| wait/synch/mutex/sql/LOCK_table_cache | 24012569 | 2.26 m | 5.65 us | 23.75 ms |
| wait/synch/mutex/sql/THD::LOCK_thd_data | 621837619 | 1.69 m | 162.80 ns | 3.15 ms |
| wait/synch/mutex/sql/THD::LOCK_query_plan | 844318371 | 1.61 m | 114.00 ns | 6.14 ms |
| wait/synch/mutex/innodb/log_sys_mutex | 9348544 | 1.33 m | 8.55 us | 13.32 ms |
| wait/synch/mutex/sql/THD::LOCK_thd_query | 444158105 | 46.16 s | 103.60 ns | 3.11 ms |
| wait/lock/table/sql/handler | 12006244 | 31.98 s | 2.66 us | 5.55 ms |
| wait/synch/mutex/innodb/lock_mutex | 8701428 | 29.39 s | 3.38 us | 8.93 ms |
| wait/synch/mutex/innodb/redo_rseg_mutex | 2670852 | 27.70 s | 10.37 us | 242.76 ms |
| wait/io/file/innodb/innodb_data_file | 73925 | 25.97 s | 351.35 us | 97.23 ms |
......
50 rows in set (0.05 sec)
| 利用等待事件分析差异
通过对上文第2节抓取的等待事件进行仔细地对比,可以看到在我的压测结果的等待事件中,对表的write(对表的写入)、fetch(可以理解为对表的读取)、sync(表数据落盘)等操作普遍在10ms以上(甚至高达37ms),而同事的压测结果的等待事件中,对表的write、fetch、sync等操作ms级别以上很少,有ms级别以上的也在2ms以内,从等待事件类型统计数据中也可以看到,wait/io/table/sql/handler 等待事件在我的测试结果中平均等待时间为688.46 us ,而同事的是6.99 us,而我的最大等待时间为11.26 s ,同事的为324.56 ms。
那么,问题来了:为什么会有这个差异?
百思不得其解,只能继续压测,然而无意间却发现,同事的环境在压测的时候数据量居然不增长...
# 同事的(从下面5次查询sbtest1表的id最大值我们可以看到,都为5159769 )
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now() |
+---------+---------------------+
| 5159769 | 2001-01-25 01:47:28 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now() |
+---------+---------------------+
| 5159769 | 2001-01-25 01:47:31 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now() |
+---------+---------------------+
| 5159769 | 2001-01-25 01:47:33 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now() |
+---------+---------------------+
| 5159769 | 2001-01-25 01:47:35 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now() |
+---------+---------------------+
| 5159769 | 2001-01-25 01:47:39 |
+---------+---------------------+
1 row in set (0.00 sec)
# 而我的(从下面5次查询sbtest1表的id最大值可以看到,5次分别为5167559 、5169955、5172245 、5174269 、5175955 )
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now() |
+---------+---------------------+
| 5167559 | 2001-01-25 01:48:29 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now() |
+---------+---------------------+
| 5169955 | 2001-01-25 01:48:31 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now() |
+---------+---------------------+
| 5172245 | 2001-01-25 01:48:33 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now() |
+---------+---------------------+
| 5174269 | 2001-01-25 01:48:36 |
+---------+---------------------+
1 row in set (0.00 sec)
mysql> select max(id),now() from sbtest1;
+---------+---------------------+
| max(id) | now() |
+---------+---------------------+
| 5175955 | 2001-01-25 01:48:38 |
+---------+---------------------+
1 row in set (0.00 sec)
why?
经过与同事重新仔细确认,唯一的差别就是sysbench的测试命令不同(sysbench是同事编译安装的,我没注意到这个细节,阴差阳错造成了使用了不同的lua脚本)
# 我的是使用的安装目录/usr/local/share/sysbench目录下的oltp.lua脚本
sysbench --test=/usr/local/share/sysbench/tests/include/oltp_legacy/oltp.lua --mysql-table-engine=innodb --mysql-db=sysbench_test --oltp-table-size=5000000 --oltp-tables-count=8 --mysql-host=172.16.130.2 --mysql-port=3307 --mysql-user=root --mysql-password='password' --report-interval=1 --threads=256 --time=3600 run
# 同事的直接使用了编译目录/root/sysbench/sysbench-1.0.9下的oltp.lua脚本
sysbench --test=/root/sysbench/sysbench-1.0.9/tests/include/oltp_legacy/oltp.lua --mysql-table-engine=innodb --mysql-db=sysbench_test --oltp-table-size=5000000 --oltp-tables-count=8 --mysql-host=172.16.130.2 --mysql-port=3307 --mysql-user=root --mysql-password='password' --report-interval=1 --threads=256 --time=3600 run
经过对比两个lua脚本文件的差异,发现有一行不相同
# 我的
for i=1, oltp_delete_inserts do
i = sb_rand(1, oltp_table_size)
rs = db_query("DELETE FROM " .. table_name .. " WHERE id=" .. i)
c_val = sb_rand_str([[
###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]])
pad_val = sb_rand_str([[
###########-###########-###########-###########-###########]])
## 不同的地方在这里,由于为了避免主键冲突,把lua生成的随机主键值i变量删了,让数据库利用表自身的自增属性来填充主键值
rs = db_query("INSERT INTO " .. table_name .. " ( k, c, pad) VALUES " .. string.format("(%d, '%s', '%s')",sb_rand(1, oltp_table_size) , c_val, pad_val))
end
# 同事的
for i=1, oltp_delete_inserts do
i = sb_rand(1, oltp_table_size)
rs = db_query("DELETE FROM " .. table_name .. " WHERE id=" .. i)
c_val = sb_rand_str([[
###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]])
pad_val = sb_rand_str([[
###########-###########-###########-###########-###########]])
## 同事的并未改动这一行,默认情况下,自增主键值是使用lua脚本生成的i变量(即1, oltp_table_size之间的随机数 )
rs = db_query("INSERT INTO " .. table_name .. " (id, k, c, pad) VALUES " .. string.format("(%d, %d, '%s', '%s')",i, sb_rand(1, oltp_table_size) , c_val, pad_val))
end
| 结论推导
仔细看第3节中的lua脚本代码,delete和insert两个操作被放进了同一个for循环中(其他的update_index和update_non_index是各自单独的一个for循环,互不影响),也就是说:
使用lua脚本随机生成主键值时:假设当delete操作删除id=1的数据行时,紧接着insert也会使用相同的id=1的主键值。所以高概率会出现innodb重复使用delete数据行所在的页来存放insert数据,在sysbench高并发压力下,大部分的insert数据存储可能只需要在内存中已存在的页内操作即可,无需太多IO操作
不使用lua脚本随机生成的主键值,而是使用表的自增属性生成主键值时:假设当delete操作删除id=1的数据行时,紧接着insert由于是表自增属性自己生成,也就是说几乎不太可能id=1,所以高概率会出现innodb重新申请一个数据页来写入insert数据,在sysbench高并发压力下,大部分的insert数据存储可能需要从磁盘文件中重新申请空间,IO操作较为频繁
至此,我和同事的压测结果有巨大差异的原因大致确定,后续经过反复的验证,也确认了是由于oltp.lua脚本的微小差异导致的。由于我们大多数时候都是使用的本地盘,而本地盘IO延迟低,通常情况下sysbench压测时这点微小的差异容易被忽略。而在此案例中,由于我们测试的环境中使用了某存储设备,相对于本地盘,IO延迟大大增加,进而造成了因为oltp.lua脚本的微小差异而导致最后压测结果的巨大差异。
提示:如果不想改动lua脚本,又想避免主键冲突有办法解决吗?有的,从sysbench 0.5版本开始,新增了一个隐藏选项--mysql-ignore-errors,用于忽略指定的错误,如果要避免主键冲突,指定选项--mysql-ignore-errors=1062 即可。
| 作者简介
罗小波·沃趣科技高级数据库技术专家
IT从业多年,历任运维工程师、高级运维工程师、运维经理、数据库工程师,曾参与版本发布系统、轻量级监控系统、运维管理平台、数据库管理平台的设计与编写,熟悉MySQL体系结构,Innodb存储引擎,喜好专研开源技术,追求完美。
相关链接
更多干货,欢迎来撩~