Java程序陷入时间裂缝:探索代码深处的神秘停顿|得物技术
目录
一、初探
二、安全点
三、停顿细节分析
四、停顿类型一览
五、停顿分类
1. JVMTI&调试
2. GC
3. 偏向锁的问题
4. 探测工具
5. 虚拟机内部运行时
6. 逆优化
7. 其他
7.1 内部调试
7.2 废弃类型-ThreadStop
六、No vm operation
1. 从输出入手
2. 匿名的停顿类型
3. 虚拟机的方案
七、非counted loop的循环
八、总结
我们来想一个问题:当你的Java程序偶然发生短暂的停顿,你会觉得是什么造成了这种现象?
绝大多数的同学立刻会回答是GC导致的STW(stop-the-world)。没错,GC确实是可以立刻先怀疑的方向。但是实际上,Java程序发生短暂停顿有非常多的可能性,我们今天来聊聊这个话题。
一
初探
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1
-XX:-DisplayVMOutput
-XX:+LogVMOutput
-XX:LogFile=/logs/vm.log
-Xloggc:/logs/gc.log
2024-03-07T20:43:34.027+0800: 549.045: Total time for which application threads were stopped: 0.0021832 seconds, Stopping threads took: 0.0003043 seconds
2024-03-07T20:43:34.078+0800: 549.096: Total time for which application threads were stopped: 0.0022472 seconds, Stopping threads took: 0.0002551 seconds
2024-03-07T20:43:34.081+0800: 549.099: Total time for which application threads were stopped: 0.0020218 seconds, Stopping threads took: 0.0002485 seconds
2024-03-07T20:43:34.082+0800: 549.101: Total time for which application threads were stopped: 0.0015735 seconds, Stopping threads took: 0.0001436 second
二
安全点
解释模式下,当线程感知到safepoint已开启时,会在当前字节码执行完成之后停下来(此时字节码已经被替换为safepoint标记); Compiled模式下,当线程感知到safepoint已开启时,会就近在方法返回或者非counted loop的循环的返回之前停下来(此时会尝试访问指定内存,若安全点已开启则此内存不可读进一步触发SIGSEGV信号捕获停止)。
三
停顿细节分析
626.369:应用已经运行的时间,以秒为单位,精确到毫秒 vmop(RevokeBias):触发本次虚拟机停顿的VM操作,这里RevokeBias代表的是一个偏向锁撤销的操作 threads:此时线程明细情况 total(1538):本次停顿时,虚拟机中的线程数量 initially_running(0):停顿发生时,仍在运行的线程数 wait_to_block(4):虚拟机开始操作执行时阻塞的线程数 times:本次停顿消耗的时间明细 spin(0):到达安全点期间因线程自旋耗时的时间 block(0):到达安全点期间因线程阻塞耗时的时间 sync(0):一般为spin和block之和,可以理解为线程到达安全点的时间 cleanup(1):本次暂停虚拟机内部层面耗时 vmop(0):本次暂停花费在实际操作上的时间,比如前面的偏向锁撤销(RevokeBias) page_trap_count:已访问JIT的pollling page的线程数量
vmop:看是具体哪个操作导致虚拟机要进行安全点停顿 times(sync):是否有部分线程导致所有线程迟迟无法到达安全点 times(vmop):到达安全点后,当前这个操作具体的耗时过程
四
停顿类型一览
ThreadStop,
ThreadDump,
PrintThreads,
FindDeadlocks,
ForceSafepoint,
ForceAsyncSafepoint,
Deoptimize,
DeoptimizeFrame,
DeoptimizeAll,
ZombieAll,
UnlinkSymbols,
Verify,
PrintJNI,
HeapDumper,
DeoptimizeTheWorld,
CollectForMetadataAllocation,
GC_HeapInspection,
GenCollectFull,
GenCollectFullConcurrent,
GenCollectForAllocation,
ParallelGCFailedAllocation,
ParallelGCSystemGC,
CGC_Operation,
CMS_Initial_Mark,
CMS_Final_Remark,
G1CollectFull,
G1CollectForAllocation,
G1IncCollectionPause,
EnableBiasedLocking,
RevokeBias,
BulkRevokeBias,
PopulateDumpSharedSpace,
JNIFunctionTableCopier,
RedefineClasses,
GetOwnedMonitorInfo,
GetObjectMonitorUsage,
GetCurrentContendedMonitor,
GetStackTrace,
GetMultipleStackTraces,
GetAllStackTraces,
GetThreadListStackTraces,
GetFrameCount,
GetFrameLocation,
ChangeBreakpoints,
GetOrSetLocal,
GetCurrentLocation,
EnterInterpOnlyMode,
ChangeSingleStep,
HeapWalkOperation,
HeapIterateOperation,
ReportJavaOutOfMemory,
JFRCheckpoint,
Exit,
LinuxDllLoad,
Terminating
五
停顿分类
JVMTI&调试
GetOrSetLocal ChangeBreakpoints HeapWalkOperation GetOwnedMonitorInfo GetCurrentContendedMonitor GetCurrentLocation EnterInterpOnlyMode ChangeSingleStep GetObjectMonitorUsage GetStackTrace GetMultipleStackTraces GetAllStackTraces GetThreadListStackTraces GetFrameCount GetFrameLocation HeapIterateOperation JNIFunctionTableCopier
一次简单的debug引发的大量GetOrSetLocal
GC
G1CollectFull G1CollectForAllocation G1IncCollectionPause GenCollectFull GenCollectFullConcurrent GenCollectForAllocation ParallelGCFailedAllocation ParallelGCSystemGC CollectForMetadataAllocation CGC_Operation CMS_Initial_Mark CMS_Final_Remark
System.gc()的显式调用
Metasapce区域的危机
字节码工具如Cglib、ByteBuddy、ASM等动态生成类; Nashorn引擎执行Javascript脚本。
偏向锁的问题
RevokeBias BulkRevokeBias EnableBiasedLocking
频繁发生的偏向锁撤销有一定的性能风险
The performance gains seen in the past are far less evident today.(很明显大家也都这么认为) Biased locking introduced a lot of complex code into the synchronization subsystem and is invasive to other HotSpot components as well.(有种改不动的感觉)
探测工具
FindDeadlocks PrintThreads PrintJNI GC_HeapInspection HeapDumper ThreadDump
一次jstack操作之后带来的3次停顿
虚拟机内部运行时
RedefineClasses
ForceSafepoint(safepoint, vm)
ForceAsyncSafepoint(safepoint, vm)
PopulateDumpSharedSpace(jvm)
Exit
LinuxDllLoad
ReportJavaOutOfMemory
RedefineClasses
[arthas@84852]$ trace java.util.Set clear
逆优化
Deoptimize DeoptimizeFrame DeoptimizeAll DeoptimizeTheWorld
其他
内部调试
ZombieAll UnlinkSymbols Verify
sh configure
--with-debug-level=slowdebug
--disable-warnings-as-errors
--with-freetype-include=/usr/local/Cellar/freetype/2.9.1/include/freetype2
--with-freetype-lib=/usr/local/Cellar/freetype/2.9.1/lib
--with-boot-jdk=/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home
--with-target-bits=64
#ifndef PRODUCT
void JavaThread::record_jump(address target, address instr, const char* file,
int line) {
...
}
#endif // PRODUCT
废弃类型-ThreadStop
六
No vm operation
从输出入手
void SafepointSynchronize::print_statistics() {
SafepointStats* sstats = _safepoint_stats;
for (int index = 0; index <= _cur_stat_index; index++) {
if (index % 30 == 0) {
print_header();
}
sstats = &_safepoint_stats[index];
tty->print("%.3f: ", sstats->_time_stamp);
tty->print("%-26s ["
INT32_FORMAT_W(8) INT32_FORMAT_W(11) INT32_FORMAT_W(15)
" ] ",
sstats->_vmop_type == -1 ? "no vm operation" :
VM_Operation::name(sstats->_vmop_type),
sstats->_nof_total_threads,
sstats->_nof_initial_running_threads,
sstats->_nof_threads_wait_to_block);
...
}
匿名的停顿类型
开启安全点,等待所有线程到达安全点; 执行特定的触发安全点停顿(比如GC、偏向锁、逆优化等); 停顿完成,所有线程恢复执行。
SafepointSynchronize::begin();
op->evaluate();
SafepointSynchronize::end();
重量级锁降级(deflating idle monitors) 内联调用缓存清理(updating inline caches) JIT方法活性标记(mark nmethods) 内部字符串再hash提高性能(rehashing string table)
虚拟机的方案
如果当前没有待执行的其他停顿,那么每过1秒钟来轮询 轮询的内容是检查当前是否有内联调用缓存是否有待清理的数据 如果满足,强制触发一次安全点
while (!should_terminate() && _cur_vm_operation == NULL) {
bool timedout =
VMOperationQueue_lock->wait(Mutex::_no_safepoint_check_flag,
GuaranteedSafepointInterval);
if (timedout && (SafepointALot ||
SafepointSynchronize::is_cleanup_needed())) {
MutexUnlockerEx mul(VMOperationQueue_lock,
Mutex::_no_safepoint_check_flag);
SafepointSynchronize::begin();
SafepointSynchronize::end();
}
...
}
七
非counted loop的循环
for (int i = 0; i < reps; i++) {
...
}
*SAFEPOINT*
(小米)HBase实战:记一次Safepoint导致长时间STW的踩坑之旅(https://juejin.cn/post/6844903878765314061) (Netty)long循环和int循环的讨论(https://github.com/netty/netty/pull/3969)
八
总结
禁止生产环境debug 生产环境禁止使用jmap等相关工具,如果使用建议流量摘除 禁止代码中使用java.lang.Thread#getAllStackTraces 慎重在生产环境使用Arthas等工具,若必须使用需注意单次命令影响类的范围 偏向锁禁用-XX:-UseBiasedLocking 可尝试加上启动参数 -XX:+DisableExplicitGC 激进场景可使用 -XX:GuaranteedSafepointInterval=0
往期回顾
文 / 财神
关注得物技术,每周一、三、五更新技术干货
要是觉得文章对你有帮助的话,欢迎评论转发点赞~
未经得物技术许可严禁转载,否则依法追究法律责任。
“
扫码添加小助手微信
如有任何疑问,或想要了解更多技术资讯,请添加小助手微信:
线下活动推荐