撰文 | 袁进辉
如果
如果是O1先被调度,那么剩余内存对O1来说是不够的,但调度器在发射O1指令的时刻并不关心也不知道当前的内存是否能满足O1的需求。
3
当一个算子使用分配器分配内存时,分配器会查询一个计数器,查询该算子是否有空闲的缓冲区可以使用。如果有,它就为该算子分配一个缓冲区,让该算子继续执行,并在该算子完成计算后释放缓冲区。如果这个算子的两个缓冲区都已被占用,分配器会将步骤2 (do compute)和步骤3 (release) 放入一个等待列表中。
Existing TF kernels encapsulate shape computation and memory allocation within the kernel implementation, making some graph compiler optimizations challenging or infeasible, such as reusing buffers across kernels. In TFRT kernels, shape computation and memory allocation will be hoisted out of the opaque C++ kernel implementations. A core design principle of TFRT is that kernel executions are never allowed to block, as this allows us to have a fine-grained control over the number of compute threads and the thread switching behavior, which is important for achieving high CPU utilization.
Go to "Discover" > "Top Stories" > "Wow"