其他
Flink常见异常和错误信息小结
部署和资源问题
(0) JDK版本过低
(1) Could not build the program from JAR file
(2) ClassNotFoundException/NoSuchMethodError/IncompatibleClassChangeError/...
(3) Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
(4) java.util.concurrent.TimeoutException: Slot allocation request timed out
(5) org.apache.flink.util.FlinkException: The assigned slot <container_id> was removed
(6) java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id <tm_id>timed out
作业问题
(1)org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator
(2) java.lang.IllegalStateException: Buffer pool is destroyed || Memory manager has been shut down
(3) akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://...]] after [10000 ms]
akka.ask.timeout
参数的值(默认只有10秒);另外,调用外部服务时尽量异步操作(Async I/O)。(4) java.io.IOException: Too many open files
ulimit -n
的文件描述符限制,再注意检查程序内是否有资源(如各种连接池的连接)未及时释放。值得注意的是,Flink使用RocksDB状态后端也有可能会抛出这个异常,此时需修改flink-conf.yaml中的state.backend.rocksdb.files.open
参数,如果不限制,可以改为-1。(5) org.apache.flink.api.common.function.InvalidTypesException: The generic type parameters of '<class>' are missing
检查点和状态问题
(1) Received checkpoint barrier for checkpoint <cp_id> before completing current checkpoint <cp_id>. Skipping current checkpoint
(2) Checkpoint <cp_id> expired before completing
CheckpointConfig.setCheckpointTimeout()
方法设定的检查点超时,如果设的太短,适当改长一点。另外就是考虑发生了反压或数据倾斜,或者barrier对齐太慢。(3) org.apache.flink.util.StateMigrationException: The new state serializer cannot be incompatible
(4) org.apache.flink.util.StateMigrationException: The new serializer for a MapState requires state migration in order for the job to proceed. However, migration for MapState currently isn't supported
文章不错?点个【在看】吧! 👇