其他
深度学习框架量化感知训练的思考及OneFlow的解决方案
作者 | BBuf
原文首发于公众号GiantPandaCV
0x0.总览
PyTorch FX模块
Eager Pass
量化感知训练
Conv+BN的融合
OneFlow的动静转换(nn.Graph)
ONNX
TensorRT
nn.Module
,算子的API和PyTorch基本一样),然后调用下面的几行代码就可以完成这个动态图模型(是一个nn.Module
)自动在合适的位置插入量化模块生成一个量化模型(仍然是nn.Module
),然后基于这个量化模型完成量化感知训练。qconfig = {
'quantization_bit': 8,
'quantization_scheme': "symmetric",
'quantization_formula': "cambricon",
'per_layer_quantization': True,
'momentum': 0.95,
}
net = quantization_aware_training(gm, flow.randn(1, 3, 32, 32), qconfig)
net = net.to(device)
quantization_resnet18 = quantization_resnet18.to("cuda")
quantization_resnet18.eval()
checkpoint = flow.load('/home/zhangxiaoyu/oneflow-cifar/checkpoint/epoch_11_val_acc_83.280000')
quantization_resnet18.load_state_dict(checkpoint)
origin_gm: flow.fx.GraphModule = flow.fx.symbolic_trace(resnet18)
dequantization_resnet18 = dequantization_aware_training(origin_gm, gm, flow.randn(1, 3, 32, 32).to("cuda"), qconfig)
dequantization_resnet18 = dequantization_resnet18.to("cuda")
dequantization_resnet18.eval()
class ResNet18Graph(flow.nn.Graph):
def __init__(self):
super().__init__()
self.m = dequantization_resnet18
def build(self, x):
out = self.m(x)
return out
def test_resnet():
resnet_graph = ResNet18Graph()
resnet_graph._compile(flow.randn(1, 3, 32, 32).to("cuda"))
with tempfile.TemporaryDirectory() as tmpdirname:
flow.save(dequantization_resnet18.state_dict(), tmpdirname)
convert_to_onnx_and_check(resnet_graph, flow_weight_dir=tmpdirname, onnx_model_path="/tmp", print_outlier=True)
ipt_dict, onnx_res = run_onnx("/tmp/model.onnx", get_onnx_provider("cpu"))
trt_res = run_tensorrt("/tmp/model.onnx", ipt_dict[list(ipt_dict.keys())[0]])
compare_result(onnx_res, trt_res, atol=1e-4, print_outlier=True)
test_resnet()
bbuf23333
,来时请备注 量化感知训练OneFlow FX(用来实现量化感知训练的基础设施):https://github.com/Oneflow-Inc/oneflow/pull/5939
OneFlow Cifar(基于OneFlow FX量化训练Cifar10):https://github.com/BBuf/oneflow-cifar
OneFlow->ONNX和TensorRT运行:
https://github.com/Oneflow-Inc/oneflow_convert/pull/45
0x1. PyTorch量化方案的沉浮
Eager Mode Quantization
def __init__(self, num_channels=1):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(num_channels, 40, 3, 1)
self.conv2 = nn.Conv2d(40, 40, 3, 1)
self.fc = nn.Linear(5*5*40, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.reshape(-1, 5*5*40)
x = self.fc(x)
return x
nn.Module
的foward里面随意构造网络,可以调用其它nn.Module
,也可以调用nn.functional.xxx
,甚至可以在里面写If这种控制逻辑。但这也带来了一个问题,就是在Eager层面比较难获取这个模型的图结构。所以在Eager Mode Quantization中,要量化这个网络必须做手动修改:def __init__(self, num_channels=1):
super(NetQuant, self).__init__()
self.conv1 = nn.Conv2d(num_channels, 40, 3, 1)
self.relu1 = nn.ReLU()
self.pool1 = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(40, 40, 3, 1)
self.relu2 = nn.ReLU()
self.pool2 = nn.MaxPool2d(2, 2)
self.fc = nn.Linear(5*5*40, 10)
self.quant = torch.quantization.QuantStub()
self.dequant = torch.quantization.DeQuantStub()
def forward(self, x):
x = self.quant(x)
x = self.relu1(self.conv1(x))
x = self.pool1(x)
x = self.relu2(self.conv2(x))
x = self.pool2(x)
x = x.reshape(-1, 5*5*40)
x = self.fc(x)
x = self.dequant(x)
return x
Conv
,Linear
这些含有参数的Module外,ReLU
,MaxPool2d
也要在__init__
中定义,Eager Mode Quantization才可以正确处理。Conv+ReLU
,那么还需要手动指定这些层进行折叠,目前这种量化模式支持ConV + BN、ConV + BN + ReLU、Conv + ReLU、Linear + ReLU、BN + ReLU
的折叠。modules_to_fuse = [['conv1', 'relu1'], ['conv2', 'relu2']] # 指定合并layer的名字
model_fused = torch.quantization.fuse_modules(model, modules_to_fuse)
model_prepared = torch.quantization.prepare(model_fused)
post_training_quantize(model_prepared, train_loader) # 这一步是做后训练量化
model_int8 = torch.quantization.convert(model_prepared)
FX Graph Mode Quantization
from torch.quantization.quantize_fx import prepare_fx, convert_fx
model = Net()
qconfig = get_default_qconfig("fbgemm")
qconfig_dict = {"": qconfig}
model_prepared = prepare_fx(model, qconfig_dict)
post_training_quantize(model_prepared, train_loader) # 这一步是做后训练量化
model_int8 = convert_fx(model_prepared)
https://github.com/NVIDIA/TensorRT/tree/master/tools/pytorch-quantization
发现PyTorch量化模型要转为ONNX来部署现在似乎还是得基于第一个版本的方案,PyTorch FX这边似乎想直接从nn.Module
转到TensorRT,不经过ONNX的中间表示,所以我这里的技术路线还是有点不一样。0x2. OneFlow FX (在Eager中写Pass)
https://github.com/Oneflow-Inc/oneflow/pull/5939
)中实现,这里复用了PyTorch FX基础设施的核心逻辑和代码,这个PR里的主要工作为:[x] 精简PyTorch FX的特殊设计比如对_C的Trace,和Jit的交互。保留核心功能,即Symbolic Tracing,Intermediate Representation和Transformation以及Python Codegen这4个组成部分。
[x] 分步实现以上四大功能的代码,完全适配OneFlow的相关设计,现在可以一键import oneflow.fx来体验。可以Trace住基本所有OneFlow API搭建的Eager模型的结构,并将其变换成一个等价的
nn.Module
,我们还可以在这个nn.Module
的基础上自定义自己的Transformation Pass,我这里实现了Shape Infer和Quantization以及Dequantization的Pass。[x] 增加AlexNet,ResNet50,MobileNetV2等模型的测试。
# Simple module for demonstration
class MyModule(oneflow.nn.Module):
def __init__(self):
super().__init__()
self.param = oneflow.nn.Parameter(oneflow.rand(3, 4))
self.linear = oneflow.nn.Linear(4, 5)
def forward(self, x):
return self.linear(x + self.param).clamp(min=0.0, max=1.0)
module = MyModule()
from oneflow.fx import symbolic_trace
# Symbolic tracing frontend - captures the semantics of the module
symbolic_traced : oneflow.fx.GraphModule = symbolic_trace(module)
# High-level intermediate representation (IR) - Graph representation
print(symbolic_traced.graph)
"""
graph():
%x : [#users=1] = placeholder[target=x]
%param : [#users=1] = get_attr[target=param]
%add : [#users=1] = call_function[target=operator.add](args = (%x, %param), kwargs = {})
%linear : [#users=1] = call_module[target=linear](args = (%add,), kwargs = {})
%clamp : [#users=1] = call_method[target=clamp](args = (%linear,), kwargs = {min: 0.0, max: 1.0})
return clamp
"""
# Code generation - valid Python code
print(symbolic_traced.code)
"""
def forward(self, x):
param = self.param
add = x + param; x = param = None
linear = self.linear(add); add = None
clamp = linear.clamp(min = 0.0, max = 1.0); linear = None
return clamp
"""
call_method
和call_function
以及math
库中的函数和常见的魔法函数都包装一遍来记录OneFlow中所有的运算符,这个在import oneflow.fx
时就做好了。然后在传入一个nn.Module
调用symbolic_trace
进行跟踪代码的时候会首先处理__init__
中的其它nn.Module
,把这些nn.Module
也用Proxy包起来,同时输入数据也要包起来。Graph
。那么Graph
是怎么转化成nn.Module
的呢?FX中通过引入GraphModule
的数据结构来持有这个Graph
,此外GraphModule
还持有code
和foward
成员,这两者都是基于Graph
自动生成的,注意GraphModule
仍然是nn.Module
。GraphModule
中的code
,打印出来其实就是整个forward
函数的完整执行过程。Interpreter
类用来让用户自定义nn.Module
的执行过程,比如这个PR提供了一个基于这个类做所有中间Tensor形状推导的Pass。另外还提供了一个基于pydot
将GraphModule
结构可视化的Pass,如下图。nn.Module
进行修改,然后返回变化后的nn.Module
。说Conv+BN
或者Conv
,Linear
等组件替换为插入了伪量化节点的组件吗?所以我们基于FX来写一个Pass就可以完成这件事了。0x3. 实现量化感知训练Pass
def __init__(
self,
conv_module,
bn_module,
quantization_bit=8,
quantization_scheme="symmetric",
quantization_formula="google",
per_layer_quantization=True,
momentum=0.95,
):
super().__init__()
self.quantization_bit = quantization_bit
self.quantization_scheme = quantization_scheme
self.quantization_formula = quantization_formula
self.per_layer_quantization = per_layer_quantization
self.conv_module = conv_module
self.bn_module = bn_module
self.moving_min_max_observer = flow.nn.MovingAverageMinMaxObserver(
training=self.training,
quantization_formula=quantization_formula,
stop_update_after_iters=1,
quantization_bit=quantization_bit,
quantization_scheme=quantization_scheme,
momentum=momentum,
)
self.min_max_observer = flow.nn.MinMaxObserver(
quantization_formula=quantization_formula,
quantization_bit=quantization_bit,
quantization_scheme=quantization_scheme,
per_layer_quantization=per_layer_quantization,
)
self.fake_quantization = flow.nn.FakeQuantization(
quantization_formula=quantization_formula,
quantization_bit=quantization_bit,
quantization_scheme=quantization_scheme,
)
def fold_bn(self, mean, std):
if self.bn_module.affine:
gamma_ = self.bn_module.weight / std
weight = self.conv_module.weight * gamma_.view(
self.conv_module.out_channels, 1, 1, 1
)
if self.conv_module.bias is not None:
bias = (
gamma_ * self.conv_module.bias - gamma_ * mean + self.bn_module.bias
)
else:
bias = self.bn_module.bias - gamma_ * mean
else:
gamma_ = 1 / std
weight = self.conv_module.weight * gamma_
if self.conv_module.bias is not None:
bias = gamma_ * self.conv_module.bias - gamma_ * mean
else:
bias = -gamma_ * mean
return weight, bias
def forward(self, x):
scale, zero_point = self.moving_min_max_observer(
x, flow.tensor([0], dtype=flow.int64).to(x.device.type)
)
x = self.fake_quantization(x, scale, zero_point)
if self.training:
y = flow.nn.functional.conv2d(
x,
self.conv_module.weight,
self.conv_module.bias,
stride=self.conv_module.stride,
padding=self.conv_module.padding,
dilation=self.conv_module.dilation,
groups=self.conv_module.groups,
)
y = y.permute(1, 0, 2, 3) # NCHW -> CNHW
y = y.view(self.conv_module.out_channels, -1) # CNHW -> C,NHW
mean = y.mean(1)
var = y.var(1)
with flow.no_grad():
self.bn_module.running_mean = (
self.bn_module.momentum * self.bn_module.running_mean
+ (1 - self.bn_module.momentum) * mean
)
self.bn_module.running_var = (
self.bn_module.momentum * self.bn_module.running_var
+ (1 - self.bn_module.momentum) * var
)
else:
mean = flow.Tensor(self.bn_module.running_mean)
var = flow.Tensor(self.bn_module.running_var)
std = flow.sqrt(var + self.bn_module.eps)
weight, bias = self.fold_bn(mean, std)
weight_scale, weight_zero_point = self.min_max_observer(weight)
res = flow.nn.functional.conv2d(
x,
self.fake_quantization(weight, weight_scale, weight_zero_point),
bias,
stride=self.conv_module.stride,
padding=self.conv_module.padding,
dilation=self.conv_module.dilation,
groups=self.conv_module.groups,
)
return res
nn.Module
抽象的计算图中的Conv+BN
都替换成这个QConvBN
组件,替换部分的代码实现如下:if x.target in insert_place:
with gm.graph.inserting_after(x):
y = x.next
if (
isinstance(insert_op_state[x.target], flow.nn.Conv2d)
and y.target in insert_place
and isinstance(insert_op_state[y.target], flow.nn.BatchNorm2d)
):
now_target = get_current_module_space(x.target)
if now_target == "":
now_target = f"fake_conv_bn.{cnt}"
else:
now_target = (
f"{get_current_module_space(x.target)}.fake_conv_bn.{cnt}"
)
gm.add_submodule(
now_target,
QConvBN(
insert_op_state[x.target],
insert_op_state[y.target],
quantization_bit,
quantization_scheme,
quantization_formula,
per_layer_quantization,
momentum,
),
)
y.replace_all_uses_with(x)
gm.graph.erase_node(y)
gm.delete_submodule(y.target)
qconvbn = gm.graph.call_module(module_name=now_target, args=x.args,)
cnt = cnt + 1
x.replace_all_uses_with(qconvbn)
gm.graph.erase_node(x)
gm.delete_submodule(x.target)
gm
(ResNet18 Trace出来的GraphModule,仍然是nn.Module
)中找到Conv
+BN
的组件,将其删除然后替换成QConvBN
组件。0x4. 基于ResNet18量化感知训练Cifar10
qconfig = {
'quantization_bit': 8,
'quantization_scheme': "symmetric",
'quantization_formula': "cambricon",
'per_layer_quantization': True,
'momentum': 0.95,
}
net = quantization_aware_training(gm, flow.randn(1, 3, 32, 32), qconfig)
net = net.to(device)
qconfig
让用户可以方便的配置OneFlow支持的各种量化方式。具体可以看之前的文章介绍:基于OneFlow实现量化感知训练net
就是用户定义的动态图模型,经过这个Pass之后获得新的net
,新的net
就已经自动插入好了量化感知训练组件。其它的训练和测试的过程和普通的FP32训练完全一致,就不赘述了。我基于ResNet18在Cifar10上训练了几个OneFlow支持的量化配置,均训练了200个Epoch,超参一致,结果如下:The `momentum` parameter in the `MovingAverageMinMaxObserver` class defaults to 0.95, which will not be changed in the following experiments.
## Accuracy
| Model | quantization_bit | quantization_scheme | quantization_formula | per_layer_quantization | Acc |
| ----------------- | ----------- | ----------- | ----------- | ----------- | ----------- |
| ResNet18 | 8 | symmetric | google | True | 95.19% |
| ResNet18 | 8 | symmetric | google | False | 95.24% |
| ResNet18 | 8 | affine | google | True | 95.32% |
| ResNet18 | 8 | affine | google | False | 95.30% |
| ResNet18 | 8 | symmetric | cambricon | True | 95.19% |
95.62%
。这里各种量化参数下的量化感知训练精度均和原始精度持平。上面的cambricon代表的是寒武纪量化方案,google代表的是Google的量化方案。0x5. 基于量化感知训练模型改写原始模型
QConvBN
组件替换成一个DConv2d
组件。DConv2d
组件代码实现如下:def __init__(
self,
in_channels,
out_channels,
kernel_size,
stride,
padding,
dilation,
groups,
quantization_bit=8,
quantization_scheme="symmetric",
quantization_formula="google",
per_layer_quantization=True,
momentum=0.95,
) -> None:
super(DConv2d, self).__init__(
in_channels, out_channels, kernel_size, stride, padding, dilation, groups
)
self.moving_min_max_observer = flow.nn.MovingAverageMinMaxObserver(
training=self.training,
quantization_formula=quantization_formula,
stop_update_after_iters=1,
quantization_bit=quantization_bit,
quantization_scheme=quantization_scheme,
momentum=momentum,
)
self.min_max_observer = flow.nn.MinMaxObserver(
quantization_formula=quantization_formula,
quantization_bit=quantization_bit,
quantization_scheme=quantization_scheme,
per_layer_quantization=per_layer_quantization,
)
self.fake_quantization = flow.nn.FakeQuantization(
quantization_formula=quantization_formula,
quantization_bit=quantization_bit,
quantization_scheme=quantization_scheme,
)
self.register_buffer("new_zero", flow.Tensor(1))
self.new_zero.fill_(0)
def forward(self, x):
scale, zero_point = self.moving_min_max_observer(
x, self.new_zero.to(flow.int64).to(x.device.type)
)
x = self.fake_quantization(x, scale, zero_point)
return flow.nn.functional.conv2d(
x,
self.weight,
self.bias,
stride=self.stride,
padding=self.padding,
dilation=self.dilation,
groups=self.groups,
)
Conv+BN
换成这个组件即可,请注意:这个组件的权重和偏置以及moving_min_max_observer
的moving_min/max
参数要赋值为训练好的量化感知模型的QConvBN
组件对应的权重和偏置以及moving_min_max_observer
的moving_min/max
参数。dequantization Pass的核心部分如下:if x.target in insert_place:
with origin_gm.graph.inserting_after(x):
y = x.next
if (
isinstance(insert_op_state[x.target], flow.nn.Conv2d)
and y.target in insert_place
and isinstance(insert_op_state[y.target], flow.nn.BatchNorm2d)
):
now_target = get_current_module_space(x.target)
if now_target == "":
now_target = f"fake_conv_bn.{cnt}"
else:
now_target = (
f"{get_current_module_space(x.target)}.fake_conv_bn.{cnt}"
)
dequanzation_conv = DConv2d(
quantization_op_state[now_target].conv_module.in_channels,
quantization_op_state[now_target].conv_module.out_channels,
quantization_op_state[now_target].conv_module.kernel_size,
quantization_op_state[now_target].conv_module.stride,
quantization_op_state[now_target].conv_module.padding,
quantization_op_state[now_target].conv_module.dilation,
quantization_op_state[now_target].conv_module.groups,
quantization_bit,
quantization_scheme,
quantization_formula,
per_layer_quantization,
momentum,
)
mean = flow.Tensor(quantization_op_state[now_target].bn_module.running_mean)
var = flow.Tensor(quantization_op_state[now_target].bn_module.running_var)
std = flow.sqrt(var + quantization_op_state[now_target].bn_module.eps)
if quantization_op_state[now_target].bn_module.affine:
gamma_ = quantization_op_state[now_target].bn_module.weight / std
weight = quantization_op_state[now_target].conv_module.weight * gamma_.view(
quantization_op_state[now_target].conv_module.out_channels, 1, 1, 1
)
if quantization_op_state[now_target].conv_module.bias is not None:
bias = (
gamma_ * quantization_op_state[now_target].conv_module.bias - gamma_ * mean + quantization_op_state[now_target].bn_module.bias
)
else:
bias = quantization_op_state[now_target].bn_module.bias - gamma_ * mean
else:
gamma_ = 1 / std
weight = quantization_op_state[now_target].conv_module.weight * gamma_
if quantization_op_state[now_target].conv_module.bias is not None:
bias = gamma_ * quantization_op_state[now_target].conv_module.bias - gamma_ * mean
else:
bias = -gamma_ * mean
dequanzation_conv.weight = flow.nn.Parameter(weight)
dequanzation_conv.bias = flow.nn.Parameter(bias)
dequanzation_conv.moving_min_max_observer.moving_max = quantization_op_state[now_target].moving_min_max_observer.moving_max
dequanzation_conv.moving_min_max_observer.moving_min = quantization_op_state[now_target].moving_min_max_observer.moving_min
origin_gm.add_submodule(
now_target,
dequanzation_conv,
)
y.replace_all_uses_with(x)
origin_gm.graph.erase_node(y)
origin_gm.delete_submodule(y.target)
qconvbn = origin_gm.graph.call_module(module_name=now_target, args=x.args,)
cnt = cnt + 1
x.replace_all_uses_with(qconvbn)
origin_gm.graph.erase_node(x)
origin_gm.delete_submodule(x.target)
DConv2d
组件。0x6. 转换ONNX以及TensorRT推理
nn.Module
了。我们将这个nn.Module
转换成ONNX然后再放到TensorRT中进行推理就可以了。这部分的示例代码在:https://github.com/Oneflow-Inc/oneflow_convert/blob/add_fx_train_quantization/examples/oneflow2onnx/quantization/test_resnet18.py
。我们截取核心部分进行解释。quantization_resnet18 = quantization_aware_training(gm, flow.randn(1, 3, 32, 32).to("cuda"), qconfig)
quantization_resnet18 = quantization_resnet18.to("cuda")
quantization_resnet18.eval()
checkpoint = flow.load('/home/zhangxiaoyu/oneflow-cifar/checkpoint/epoch_11_val_acc_83.280000')
quantization_resnet18.load_state_dict(checkpoint)
# 基于量化感知训练模型改写原始模型
origin_gm: flow.fx.GraphModule = flow.fx.symbolic_trace(resnet18)
dequantization_resnet18 = dequantization_aware_training(origin_gm, gm, flow.randn(1, 3, 32, 32).to("cuda"), qconfig)
dequantization_resnet18 = dequantization_resnet18.to("cuda")
dequantization_resnet18.eval()
# nn.Graph是转ONNX的桥梁,是把OneFlow的动态图转为静态图
class ResNet18Graph(flow.nn.Graph):
def __init__(self):
super().__init__()
self.m = dequantization_resnet18
def build(self, x):
out = self.m(x)
return out
# 测试函数
def test_resnet():
resnet_graph = ResNet18Graph()
resnet_graph._compile(flow.randn(1, 3, 32, 32).to("cuda"))
with tempfile.TemporaryDirectory() as tmpdirname:
flow.save(dequantization_resnet18.state_dict(), tmpdirname)
convert_to_onnx_and_check(resnet_graph, flow_weight_dir=tmpdirname, onnx_model_path="/tmp", print_outlier=True)
ipt_dict, onnx_res = run_onnx("/tmp/model.onnx", get_onnx_provider("cpu"))
trt_res = run_tensorrt("/tmp/model.onnx", ipt_dict[list(ipt_dict.keys())[0]])
compare_result(onnx_res, trt_res, atol=1e-4, print_outlier=True)
test_resnet()
nn.Module
)通过OneFlow的nn.Graph
将其转为静态图,nn.Graph
的资料见:nn.Graph
这一步?这是因为OneFlow的转化ONNX工具是基于静态图做的,所以额外多了这一步,如果你不想理解也没关系,上面的代码中已经展示了完整的用法了。onnx>=1.8.0
onnxruntime>=1.6.0
oneflow>=0.5.0
pip install oneflow_onnx
oneflow_onnx
中的convert_to_onnx_and_check
API将量化训练模型转化为ONNX。我们看一眼量化感知训练后的ResNet18转化成ONNX之后长什么样子吧。onnxruntime-gpu>=1.8.0
opencv-python
pytest
nvidia-tensorrt==8.0.0.3
pycuda
flake8
trt_res = run_tensorrt("/tmp/model.onnx", ipt_dict[list(ipt_dict.keys())[0]])
compare_result(onnx_res, trt_res, atol=1e-4, print_outlier=True)
-5.438802 -5.4388037
3.5198674 3.5198674
2.409646 2.4096458
4.5826764 4.5826764
0.019911028 0.019910894
6.6347113 6.634712
-3.5996702 -3.5996711
-1.3407612 -1.340761
-3.8473191 -3.847319
engine
的影响只计算推理那部分的时间,我还没有改那部分代码,读者如果感兴趣可以先自行计算一下时间。后面可能会专门写一篇文章来介绍一下部署前后的精度和速度对比,另外目前实现的方案可能还存在漏洞需要更加精细的Check。0x7. 总结
nn.Module
基础上,修改很少的代码即可完成从nn.Module
量化感知训练到用TensorRT将量化感知训练后的模型部署到GPU上运行的完整链路。nn.Module
)->OneFlow量化感知训练模型(nn.Module
)->OneFlow静态图(nn.Graph
)->ONNX->TensorRT。量化感知训练是基于支持在Eager下写Pass的FX模块(FX被PyTorch率先提出,笔者将其基础设施移植到了OneFlow)来完成的。读者如果想体验这个功能可以按照本文的方法进行操作,有任何使用上的问题可以联系笔者。https://docs.oneflow.org https://github.com/Oneflow-Inc/oneflow https://github.com/Oneflow-Inc/oneflow_convert https://github.com/BBuf/oneflow-cifar 神经网络量化入门--Folding BN ReLU代码实现 基于OneFlow实现量化感知训练