OneFlow源码解析:Tensor类型体系与Local Tensor
撰文|郑建华
更新|赵露阳
tensor和op是神经网络模型最基本的组件:op是模型的节点,tensor是连接节点的边。然而,构建一个tensor并不仅仅是构造一个对象那么简单,至少要考虑以下问题:
要支持节点本地的local tensor,以及分布式的global tensor;
要支持eager和lazy执行模式;
要支持不同的数据类型,包括float、double、int等;
要支持不同设备。
与PyTorch类似,在OneFlow中也可以通过两种主要的方式来创建tensor:
Python层的Tensor是在tensor.py(https://github.com/Oneflow-Inc/oneflow/blob/2e6a72c8734b9929191306df35b4284e9caa8126/python/oneflow/framework/tensor.py#L23)中引入的,通过python c api注册的Tensor类型对象,此对象在MakeTensorType
(https://github.com/Oneflow-Inc/oneflow/blob/2e6a72c8734b9929191306df35b4284e9caa8126/oneflow/api/python/framework/tensor.cpp#L623)中被定义和返回。
在MakeTensorType中主要通过PyTensorObject_init创建了Tensor对象:
static int PyTensorObject_init(PyObject* self, PyObject* args, PyObject* kwargs) {
HANDLE_ERRORS
auto* temp = functional::_legacy_tensor_ctor(NULL, args, kwargs);
if (PyErr_Occurred()) { throw py::error_already_set(); }
auto* _self = (PyTensorObject*)self;
_self->data = PyTensor_Unpack(temp);
_self->data->set_pyobject(self);
// reset temp data to prevent clearing the pyobject
// when the temp is deallocated
((PyTensorObject*)temp)->data.reset();
Py_XDECREF(temp);
return 0;
END_HANDLE_ERRORS_RET(-1)
}
通过
static PyMethodDef PyTensorObject_methods[] = {
{"storage_offset", PyTensorObject_storage_offset, METH_NOARGS, NULL},
{"stride", PyTensorObject_stride, METH_NOARGS, NULL},
{"is_contiguous", PyTensorObject_is_contiguous, METH_NOARGS, NULL},
{"contiguous", PyTensorObject_contiguous, METH_NOARGS, NULL},
{"contiguous_", PyTensorObject_contiguous_, METH_NOARGS, NULL},
{"pin_memory", PyTensorObject_pin_memory, METH_NOARGS, NULL},
{"is_pinned", PyTensorObject_is_pinned, METH_NOARGS, NULL},
{"requires_grad_", (PyCFunction)PyTensorObject_requires_grad_, METH_VARARGS | METH_KEYWORDS,
NULL},
{"retain_grad", PyTensorObject_retain_grad, METH_NOARGS, NULL},
{"detach", PyTensorObject_detach, METH_NOARGS, NULL},
{"clone", PyTensorObject_clone, METH_NOARGS, NULL},
{"zero_", PyTensorObject_zero_, METH_NOARGS, NULL},
{"register_hook", PyTensorObject_register_hook, METH_O, NULL},
{"_register_post_grad_accumulation_hook", PyTensorObject__register_post_grad_accumulation_hook,
METH_O, NULL},
{"global_id", PyTensorObject_global_id, METH_NOARGS, NULL},
{"check_meta_consistency", PyTensorObject_check_meta_consistency, METH_NOARGS, NULL},
{"to_numpy", PyTensorObject_to_numpy, METH_NOARGS, NULL},
{"type", (PyCFunction)PyTensorObject_type, METH_VARARGS | METH_KEYWORDS, NULL},
此外,在Python层通过RegisterMethods(https://github.com/Oneflow-Inc/oneflow/blob/2e6a72c8734b9929191306df35b4284e9caa8126/python/oneflow/framework/tensor.py#L502)也为Tensor注册了一些Python实现的Tensor方法或属性(如tensor.numpy),在OneFlow包初始化时会通过RegisterMethod4Class
(https://github.com/Oneflow-Inc/oneflow/blob/2e6a72c8734b9929191306df35b4284e9caa8126/python/oneflow/framework/register_class_method_util.py#L23)完成这些Python方法和属性的注册。RegisterMethod4Class的调用流程如下:
- name: "tensor"
signature: [
"Tensor (PyObject* data, *, DataType dtype=None, Device device=None,
Bool requires_grad=False, Bool pin_memory=False) => TensorWithData",
"Tensor (PyObject* data, *, DataType dtype=None, Placement placement,
SbpList sbp, Bool requires_grad=False) => GlobalTensorWithData",
]
bind_python: True
TensorWithData GlobalTensorWithData
flow.Tensor flow.tensor
import oneflow
import numpy as np
oneflow.tensor([[1., -1.], [1., -1.]])
# tensor([[ 1., -1.],
# [ 1., -1.]], dtype=oneflow.float32)
oneflow.tensor(np.array([[1, 2, 3], [4, 5, 6]]))
# tensor([[ 1, 2, 3],
# [ 4, 5, 6]], dtype=oneflow.int64)
flow.Tensor([[1,2,3],[4,5,6]])
oneflow.tensor
global tensor (https://docs.oneflow.org/master/parallelism/03_consistent_tensor.html)
class TensorWithDataFunctor {
public:
Maybe<Tensor> operator()(PyObject* data, const Optional<Symbol<DType>>& dtype,
const Optional<Symbol<Device>>& device, const bool requires_grad,
const bool pin_memory) const {
...
if (PyTensor_Check(data)) {
// Throw warnings like pytorch.
auto ret = PyErr_WarnEx(
PyExc_UserWarning,
"To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() "
"or sourceTensor.clone().detach().requires_grad_(True), rather than "
"oneflow.tensor(sourceTensor).",
1);
if (ret != 0) { return Error::RuntimeError(); }
const auto& other = PyTensor_Unpack(data);
return MakeTensorFromOtherTensor(other, dtype, device, requires_grad, pin_memory);
} else {
// Make tensor from python sequence or numpy array.
return MakeLocalTensorFromData(data, dtype, device, requires_grad, pin_memory);
}
}
};
template<typename... Args>
static Maybe<void> SwitchCopyLocalTensorFromUntypedArray(
const std::tuple<DataType>& switch_tuple, Args&& ... args) {
static const std::map<std::tuple<DataType>, std::function<Maybe<void>(Args && ...)>>
case_handlers {
{SwitchCase(DataType::kFloat),
[](Args&&... args) {
return CopyLocalTensorFromUntypedArray<float>(std::forward<Args>(args)...);
}},
// ...
};
return case_handlers.at(switch_tuple)(std::forward<Args>(args)...);
};
template<typename T>
Maybe<void> CopyLocalTensorFromUntypedArray(const std::shared_ptr<Tensor>& tensor,
PyObject* array) {
return CopyBetweenLocalTensorAndNumpy<T>(tensor, array, CopyFromNumpyArray, "mut",
/*block_host_until_done=*/false);
}
void CopyFromNumpyArray(ep::Stream* stream,
const std::shared_ptr<vm::EagerBlobObject>& eager_blob_object,
const NumPyArrayPtr& array_ptr) {
SyncAutoMemcpy(stream, eager_blob_object->mut_dptr(), array_ptr.data(),
eager_blob_object->ByteSizeOfBlobBody(), eager_blob_object->mem_case(),
memory::MakeHostMemCase());
}
JUST(PhysicalRun([&](InstructionsBuilder* builder) -> Maybe<void> {
return builder->AccessBlobByCallback(
tensor,
[array_ptr, Copy](ep::Stream* stream,
const std::shared_ptr<vm::EagerBlobObject>& eager_blob_object) {
Copy(stream, eager_blob_object, array_ptr);
},
modifier);
}));
template<typename T>
Maybe<void> InstructionsBuilder::AccessBlobByCallback(
const T tensor,
const std::function<void(ep::Stream*, const std::shared_ptr<vm::EagerBlobObject>&)>& callback,
const std::string& modifier) {
const std::shared_ptr<vm::EagerBlobObject>& eager_blob_object = JUST(tensor->eager_blob_object());
Symbol<Device> device = JUST(GetDevice(tensor));
...
Symbol<Stream> stream = JUST(GetDefaultStreamByDevice(device));
JUST(SoftSyncStream({eager_blob_object}, stream));
auto instruction = intrusive::make_shared<vm::Instruction>(
// Never replace `stream` with producer_stream or last_used_stream.
JUST(Singleton<VirtualMachine>::Get()->GetVmStream(stream)),
std::make_shared<vm::AccessBlobArgCbInstructionPolicy>(eager_blob_object, callback,
modifier));
instruction_list_->EmplaceBack(std::move(instruction));
return Maybe<void>::Ok();
}
OneFlow源码:https://github.com/Oneflow-Inc/oneflow OneFlow源码解析:Op、Kernel与解释器 OneFlow源码解析:算子指令在虚拟机中的执行