使用eBPF提升可观测性
1前言
上周在墨天轮看到了一篇帖子 PostgreSQL + eBPF实现数据库服务可观测,十分有趣,加之eBPF最近很火,在可观测领域有着举足轻重的地位。
赶着蹭一波热点,学一下这个eBPF。
2介绍
eBPF 是一种非常灵活和强大的内核技术,可以用于多种应用场景。下面是 eBPF 的一些常见用途:
网络监控:eBPF 可以用于捕获网络数据包,并执行特定的逻辑来分析网络流量。例如,可以使用 eBPF 程序来监控网络流量,并在发现异常流量时进行警报。 安全过滤:eBPF 可以用于对网络数据包进行安全过滤。例如,可以使用 eBPF 程序来阻止恶意流量的传播,或者在发现恶意流量时对其进行拦截。 性能分析:eBPF 可以用于对内核的性能进行分析。例如,可以使用 eBPF 程序来收集内核的性能指标,并通过特定的接口将其可视化。这样,可以更好地了解内核的性能瓶颈,并进行优化。 虚拟化:eBPF 可以用于虚拟化技术。例如,可以使用 eBPF 程序来收集虚拟机的性能指标,并进行负载均衡。这样,可以更好地利用虚拟化环境的资源,提高系统的性能和稳定性。
安全,网络,负载均衡,故障分析,追踪等领域都是eBPF的主战场。
3使用
eBPF虽然是出来很多年的东西了,但在数据库这个领域相关的应用还不算太多。
在 github 上有一篇帖子,用 eBPF trace PostgreSQL 的几种姿势 ,里面介绍了几个项目
PGTracer:https://github.com/aiven/pgtracer,2022 年诞生的第一个 eBPF 来观测 PostgreSQL 的企业项目,来自 aiven 这个公司
PGTracer is a collection of tools to trace queries, execution plans and more in PostgreSQL®, using eBPF.
PGTracer offers a way to instrument PostgreSQL, using the Linux eBPF facility. As it does advanced memory access, it needs the PostgreSQL debug symbols to resolve symbols and offsets in structs.
PGTracer 是一个工具集合,可以使用 eBPF 在 PostgreSQL 中跟踪查询、执行计划等等。PGTracer 提供了一种使用 Linux eBPF 工具检测 PostgreSQL 的方法。在进行高级内存访问时,它需要 PostgreSQL 调试符号来解析结构中的符号和偏移量。
参照作者自己的演示,https://www.reddit.com/r/PostgreSQL/comments/zlc422/pgtracer_building_a_perflike_tool_for_postgresql/,可以看到这个工具可以把缓冲区/执行计划/WAL使用量等都实时抓取出来。
另外一个项目是 pg-lock-tracer,也是一个新兴项目,该项目专注于研究 PostgreSQL 中各式各样的锁,锁这东西本身能够深入理解的老油条就屈指可数,尤其是LWLock/SpinLock之类用于封锁共享内存中的数据结构的底层锁,日常能够接触到的只能在一些 pg_stat_activity 中看到,诸如WalInsertLock/WALWriteLock等,这个项目的诞生大大提升了锁的可观测性。
This project provides tools that allow you to gain deep insights into PostgreSQL's locking activities and troubleshoot locking-related issues (e.g., performance problems or deadlocks).
pg_lock_tracer
- is a lock tracer for PostgreSQL.pg_lw_lock_tracer
- is a tracer for PostgreSQL lightweight locks (LWLocks).animate_lock_graph
- creates animated locks graphs based on thepg_lock_tracer
output.Note: Most of these tools employ the BPF / eBPF (Extended Berkeley Packet Filter) technology. At the moment, PostgreSQL 12, 13, 14, and 15 are supported (see additional information below).
4小试牛刀
安装过程就不说了
usage examples:
# Trace use binary '/home/jan/postgresql-sandbox/bin/REL_15_1_DEBUG/bin/postgres' for tracing and trace pid 1234
pg_lock_tracer -x /home/jan/postgresql-sandbox/bin/REL_15_1_DEBUG/bin/postgres -p 1234
# Trace two PIDs
pg_lock_tracer -x /home/jan/postgresql-sandbox/bin/REL_15_1_DEBUG/bin/postgres -p 1234 -p 5678
# Be verbose
pg_lock_tracer -x /home/jan/postgresql-sandbox/bin/REL_15_1_DEBUG/bin/postgres -p 1234 -v
# Use the given db connection to access the catalog of PID 1234 to resolve OIDs
pg_lock_tracer -x /home/jan/postgresql-sandbox/bin/REL_15_1_DEBUG/bin/postgres -p 1234 -r 1234:psql://jan@localhost/test2
# Output in JSON format
pg_lock_tracer -x /home/jan/postgresql-sandbox/bin/REL_15_1_DEBUG/bin/postgres -p 1234 -j
# Print stacktrace on deadlock
pg_lock_tracer -x /home/jan/postgresql-sandbox/bin/REL_15_1_DEBUG/bin/postgres -p 1234 -s DEADLOCK
# Print stacktrace for locks and deadlocks
pg_lock_tracer -x /home/jan/postgresql-sandbox/bin/REL_15_1_DEBUG/bin/postgres -p 1234 -s LOCK DEADLOCK
# Trace only Transaction and Query related events
pg_lock_tracer -x /home/jan/postgresql-sandbox/bin/REL_15_1_DEBUG/bin/postgres -p 1234 -t TRANSACTION QUERY
# Write the output into file 'trace'
pg_lock_tracer -x /home/jan/postgresql-sandbox/bin/REL_15_1_DEBUG/bin/postgres -p 1234 -o trace
# Show statistics about locks
pg_lock_tracer -x /home/jan/postgresql-sandbox/bin/REL_15_1_DEBUG/bin/postgres -p 1234 --statistics
让我们小试牛刀一下
postgres=# create table t1(id int);
CREATE TABLE
postgres=# insert into t1 values(generate_series(1,1000000));
INSERT 0 1000000
postgres=# select pg_backend_pid();
pg_backend_pid
----------------
1455
(1 row)
postgres=# select * from t1 where id = 99;
id
----
99
(1 row)
同时另外一个窗口进行跟踪,可以看到打开了哪些锁,锁的持有时间等等
root@iZ2ze8fs957s9uk6mrjdi8Z:~# pg_lock_tracer -x /usr/pgsql-15/bin/postgres -p 1455
===> Compiling BPF program
===> Attaching BPF probes
===> Ready to trace queries
256574124323 [Pid 1455] Query begin 'select * from t1 where id = 99;'
256574149788 [Pid 1455] Transaction begin
256574158097 [Pid 1455] Accept invalidation messages
256574249581 [Pid 1455] Table open (by range value) .t1 AccessShareLock
256574256778 [Pid 1455] Accept invalidation messages
256574266066 [Pid 1455] Table open 1259 AccessShareLock
256574272434 [Pid 1455] Lock object 1259 AccessShareLock
256574281101 [Pid 1455] Lock granted (fastpath) 1259 AccessShareLock
256574287386 [Pid 1455] Lock granted (local) 1259 AccessShareLock (Already hold local 0)
256574292847 [Pid 1455] Accept invalidation messages
256574298780 [Pid 1455] Lock was acquired in 26346 ns
256574302740 [Pid 1455] Lock object 2663 AccessShareLock
256574308188 [Pid 1455] Lock granted (fastpath) 2663 AccessShareLock
256574313346 [Pid 1455] Lock granted (local) 2663 AccessShareLock (Already hold local 0)
256574319175 [Pid 1455] Accept invalidation messages
256574325472 [Pid 1455] Lock was acquired in 22732 ns
256574392573 [Pid 1455] Lock ungranted (fastpath) 2663 AccessShareLock
256574402855 [Pid 1455] Lock ungranted (local) 2663 AccessShareLock (Hold local 0)
256574410605 [Pid 1455] Table close 1259 AccessShareLock
256574420131 [Pid 1455] Lock ungranted (fastpath) 1259 AccessShareLock
256574427096 [Pid 1455] Lock ungranted (local) 1259 AccessShareLock (Hold local 0)
256574431193 [Pid 1455] Lock object 16384 AccessShareLock
256574440514 [Pid 1455] Lock granted (fastpath) 16384 AccessShareLock
256574447612 [Pid 1455] Lock granted (local) 16384 AccessShareLock (Already hold local 0)
256574454227 [Pid 1455] Accept invalidation messages
256574461670 [Pid 1455] Lock was acquired in 30477 ns
256574466583 [Pid 1455] Table open 1259 AccessShareLock
256574473439 [Pid 1455] Lock object 1259 AccessShareLock
...
root@iZ2ze8fs957s9uk6mrjdi8Z:~# pg_lock_tracer -x /usr/pgsql-15/bin/postgres -p 1455 --statistic
===> Compiling BPF program
===> Attaching BPF probes
===> Ready to trace queries
305594728170 [Pid 1455] Query begin 'select * from t1 where id = 99;'
305594743006 [Pid 1455] Transaction begin
305594751718 [Pid 1455] Accept invalidation messages
305594803943 [Pid 1455] Table open (by range value) .t1 AccessShareLock
305594810926 [Pid 1455] Accept invalidation messages
305594823343 [Pid 1455] Lock object 16384 AccessShareLock
305594832930 [Pid 1455] Lock granted (fastpath) 16384 AccessShareLock
305594839275 [Pid 1455] Lock granted (local) 16384 AccessShareLock (Already hold local 0)
305594844630 [Pid 1455] Accept invalidation messages
305594850512 [Pid 1455] Lock was acquired in 27169 ns
305594856780 [Pid 1455] Table close 16384 NoLock
305594885491 [Pid 1455] Table open 16384 NoLock
305594890861 [Pid 1455] Table close 16384 NoLock
305594896054 [Pid 1455] Table open 16384 NoLock
305594901948 [Pid 1455] Table close 16384 NoLock
305594923263 [Pid 1455] Table open 16384 NoLock
305594935576 [Pid 1455] Table close 16384 NoLock
305595011436 [Pid 1455] Table open 16384 NoLock
305639384394 [Pid 1455] Table close 16384 NoLock
305639417928 [Pid 1455] Transaction commit
305639433088 [Pid 1455] Lock ungranted (fastpath) 16384 AccessShareLock
305639439356 [Pid 1455] Lock ungranted (local) 16384 AccessShareLock (Hold local 1)
305639456018 [Pid 1455] Query done
Lock statistics:
================
Locks per OID
+-----------+----------+------------------------------+
| Lock Name | Requests | Total Lock Request Time (ns) |
+-----------+----------+------------------------------+
| 16384 | 1 | 27169 |
+-----------+----------+------------------------------+
Lock types
+-----------------+---------------------------+
| Lock Type | Number of requested locks |
+-----------------+---------------------------+
| AccessShareLock | 1 |
+-----------------+---------------------------+
让我们再试试死锁的追踪,可以看到还会打印出相应的堆栈 👇🏻,让我们追踪代码更加方便
551993531529 [Pid 1530] DEADLOCK DETECTED
DeadLockReport+0x0 [postgres]
LockAcquireExtended+0x3bd [postgres]
LockAcquire+0x1f [postgres]
XactLockTableWait+0xb0 [postgres]
heap_update+0x432 [postgres]
heapam_tuple_update+0x8b [postgres]
ExecUpdateAct+0x1dd [postgres]
ExecUpdate+0x1e7 [postgres]
ExecModifyTable+0x5de [postgres]
ExecProcNodeFirst+0x2d [postgres]
ExecutePlan+0x9c [postgres]
standard_ExecutorRun+0xe3 [postgres]
ExecutorRun+0x23 [postgres]
ProcessQuery+0x6e [postgres]
PortalRunMulti+0xb8 [postgres]
PortalRun+0x1af [postgres]
exec_simple_query+0x2fd [postgres]
[unknown] [[uprobes]]
report_fork_failure_to_client+0x0 [postgres]
BackendStartup+0x1c9 [postgres]
ServerLoop+0x172 [postgres]
PostmasterMain+0xf68 [postgres]
main+0x226 [postgres]
[unknown] [libc.so.6]
551993992344 [Pid 1455] Lock ungranted 16384 ExclusiveLock (Requested locks 1)
551993998577 [Pid 1455] Lock ungranted (local) 16384 ExclusiveLock (Hold local 0)
552055549877 [Pid 1455] Table close 16384 NoLock
552055605754 [Pid 1455] Query done
Event | Description |
---|---|
TRANSACTION | Transactions related events (e.g., StartTransaction , CommitTransaction , DeadLockReport ) |
QUERY | The executed queries (e.g., exec_simple_query ) |
TABLE | Table open and close events (e.g., table_open , table_openrv , table_close ) |
LOCK | Lock events (e.g., LockRelationOid , UnlockRelationOid , GrantLock , FastPathGrantRelationLock ) |
INVALIDATION | Processing of cache invalidation messages (e.g., AcceptInvalidationMessages ) |
ERROR | Error related events (e.g., bpf_errstart ) |
除此之外,pg_lock_tracer还支持五种"事件"的追踪👆🏻
另外一个提供的工具是pg_lw_lock_trace,顾名思义,用于追踪LWLock的
examples:
# Trace the LW locks of the PID 1234
pg_lw_lock_tracer -p 1234
# Trace the LW locks of the PIDs 1234 and 5678
pg_lw_lock_tracer -p 1234 -p 5678
# Trace the LW locks of the PID 1234 and be verbose
pg_lw_lock_tracer -p 1234 -v
# Trace the LW locks of the PID 1234 and collect statistics
pg_lw_lock_tracer -p 1234 -v --statistics
还记得这此大会上吕海波大师的分享
一次最简单的Select,最少调用它10到20次
一次最简单的DML,最少调用它40次左右。
让我们来试验一下,很简单的查询 select * from t1 where id = 99;
加了个索引之后,可以看到总共执行了16次LWLockAcquire,niubility
root@iZ2ze8fs957s9uk6mrjdi8Z:~# pg_lw_lock_tracer -p 1455
==> Attaching to PIDs [1455]
===> Compiling BPF program
===> Ready to trace
1345849814039 [Pid 1455] Acquired lock LockFastPath (mode LW_EXCLUSIVE) / LWLockAcquire()
1345849826571 [Pid 1455] Unlock LockFastPath
1345849859033 [Pid 1455] Acquired lock ProcArray (mode LW_SHARED) / LWLockAcquire()
1345849861806 [Pid 1455] Unlock ProcArray
1345849876540 [Pid 1455] Acquired lock LockFastPath (mode LW_EXCLUSIVE) / LWLockAcquire()
1345849879125 [Pid 1455] Unlock LockFastPath
1345849926838 [Pid 1455] Acquired lock LockFastPath (mode LW_EXCLUSIVE) / LWLockAcquire()
1345849929069 [Pid 1455] Unlock LockFastPath
1345849991093 [Pid 1455] Acquired lock ProcArray (mode LW_SHARED) / LWLockAcquire()
1345849993905 [Pid 1455] Unlock ProcArray
1345850028030 [Pid 1455] Acquired lock BufferMapping (mode LW_SHARED) / LWLockAcquire()
1345850031309 [Pid 1455] Unlock BufferMapping
1345850033841 [Pid 1455] Acquired lock BufferContent (mode LW_SHARED) / LWLockAcquire()
1345850038227 [Pid 1455] Unlock BufferContent
1345850040821 [Pid 1455] Acquired lock BufferMapping (mode LW_SHARED) / LWLockAcquire()
1345850043537 [Pid 1455] Unlock BufferMapping
1345850045684 [Pid 1455] Acquired lock BufferContent (mode LW_SHARED) / LWLockAcquire()
1345850048645 [Pid 1455] Unlock BufferContent
1345850050954 [Pid 1455] Acquired lock BufferMapping (mode LW_SHARED) / LWLockAcquire()
1345850053395 [Pid 1455] Unlock BufferMapping
1345850055435 [Pid 1455] Acquired lock BufferContent (mode LW_SHARED) / LWLockAcquire()
1345850061344 [Pid 1455] Unlock BufferContent
1345850064382 [Pid 1455] Acquired lock BufferMapping (mode LW_SHARED) / LWLockAcquire()
1345850067375 [Pid 1455] Unlock BufferMapping
1345850069660 [Pid 1455] Acquired lock BufferMapping (mode LW_SHARED) / LWLockAcquire()
1345850072196 [Pid 1455] Unlock BufferMapping
1345850075026 [Pid 1455] Acquired lock BufferContent (mode LW_SHARED) / LWLockAcquire()
1345850078618 [Pid 1455] Unlock BufferContent
1345850100866 [Pid 1455] Acquired lock LockFastPath (mode LW_EXCLUSIVE) / LWLockAcquire()
1345850103052 [Pid 1455] Unlock LockFastPath
1345850105482 [Pid 1455] Acquired lock LockFastPath (mode LW_EXCLUSIVE) / LWLockAcquire()
1345850108860 [Pid 1455] Unlock LockFastPath
1345850134104 [Pid 1455] Acquired lock PgStatsData (mode LW_EXCLUSIVE) / LWLockConditionalAcquire()
1345850137927 [Pid 1455] Unlock PgStatsData
1345850141902 [Pid 1455] Acquired lock PgStatsData (mode LW_EXCLUSIVE) / LWLockConditionalAcquire()
1345850144760 [Pid 1455] Unlock PgStatsData
1345850148453 [Pid 1455] Acquired lock PgStatsData (mode LW_EXCLUSIVE) / LWLockConditionalAcquire()
1345850152371 [Pid 1455] Unlock PgStatsData
5小结
eBPF 有一些类似的工具。例如,SystemTap 是一种开源工具,可以帮助用户收集 Linux 内核的运行时数据。它通过动态加载内核模块来实现这一功能,类似于 eBPF。另外,DTrace 是一种动态跟踪和分析工具,可以用于收集系统的运行时数据,类似于 eBPF 和 SystemTap。
之前也曾写过一篇使用Dtrace分析问题,典型场景是我们想知道跑了哪些函数,哪些流程,按照以往GDB打断点的形式就不可行,因为你得知道在哪里打断点,为此我们便可以使用Dtrace,根据PostgreSQL自身提供的一些可观测点,https://www.postgresql.org/docs/current/dynamic-trace.html 来进行分析。
6参考
https://mp.weixin.qq.com/s/AtELcYXLv9ZS6lNGrBkF0A
https://github.com/yihong0618/gitblog/issues/257
https://twitter.com/yihong0618/status/1619230595766448130
https://www.reddit.com/r/PostgreSQL/comments/zlc422/pgtracer_building_a_perflike_tool_for_postgresql/
modb.pro/db/617157