其他
以OneFlow为例梳理深度学习框架的那些插值方法
撰文 | BBuf
原文首发于公众号GiantPandaCV
这篇文章基于自己为OneFlow框架开发interpolate
这个Op总结而来,OneFlow的interpolate
Op 和 Pytorch的功能一致,都是用来实现插值上采样或者下采样的。在实现这个Op的时候还给Pytorch修复了一个bug并合并到了主仓库,见:
https://github.com/pytorch/pytorch/commit/6ab3a210983b7eee417e7cd92a8ad2677065e470。因此OneFlow框架中的interpolate
算子和Pytorch中的interpolate
算子的功能是完全等价的。这篇文章就以OneFlow中这个算子的实现为例来盘点一下深度学习框架中的那些插值算法。
1
doc && interface接口
要了解interpolate
算子中的插值需要算法,首先从文档和Python前端接口看起。看一下接口文档,https://oneflow.readthedocs.io/en/master/functional.html?highlight=interpolate 。
这里可以看到OneFlow的interpolate
算子用来实现插值上采样或者下采样的功能,支持3-D,4-D,5-D的输入Tensor,然后提供了多种插值的方式应用于不同Shape的输入Tensor。下面再看一下参数列表:
input
:输入Tensor。size
:插值后输出Tensor的空间维度的大小,这个spatial size就是去掉Batch,Channel,Depth维度后剩下的值。比如NCHW的spatial size是HW。scale_factor
(float 或者 Tuple[float]):spatial size的乘数,如果是tuple则必须匹配输入数据的大小。mode
(str):上采样的模式,包含'nearest' | 'linear' | 'bilinear' | 'bicubic' | 'trilinear' | 'area'。默认是 'nearest'。align_corners
(bool):在几何上,我们将输入和输出的像素视为正方形而不是点。如果设置为True
,则输入和输出张量按其角像素的中心点对齐,保留角像素处的值。如果设置为False
,则输入和输出张量按其角像素的角点对齐,插值使用边缘值填充来处理边界外值,当scale_factor
保持不变时,此操作与输入大小无关。这仅在mode
为 'linear' | 'bilinear' | 'bicubic' | 'trilinear'时有效。默认值是False
。(没看懂没关系,下面有一节专门讲解)recompute_scale_factor
(bool):重新计算用于插值计算的 scale_factor。当 scale_factor 作为参数传递时,它用于计算 output_size。如果 recompute_scale_factor 为False
或未指定,则传入的 scale_factor 将用于插值计算。否则,将根据用于插值计算的输出和输入大小计算新的 scale_factor(即,等价于显示传入output_size
)。请注意,当 scale_factor 是浮点数时,由于舍入和精度问题,重新计算的 scale_factor 可能与传入的不同。
>>> import numpy as np
>>> input = flow.Tensor(np.arange(1, 5).reshape((1, 1, 4)), dtype=flow.float32)
>>> output = flow.nn.functional.interpolate(input, scale_factor=2.0, mode="linear")
>>> output
tensor([[[1.0000, 1.2500, 1.7500, 2.2500, 2.7500, 3.2500, 3.7500, 4.0000]]],
dtype=oneflow.float32)
https://github.com/Oneflow-Inc/oneflow/blob/master/python/oneflow/nn/modules/interpolate.py#L25-L193
。这里的主要逻辑就是在根据是否传入了recompute_scale_factor
参数来重新计算scale_factor
的值,在获得了scale_factor
之后根据传入的mode
调用不同的插值Kernel的实现。见:return flow._C.upsample_nearest_1d(
x, scale_factor=scale_factors[0], data_format="channels_first"
)
if len(x.shape) == 4 and self.mode == "nearest":
return flow._C.upsample_nearest_2d(
x,
height_scale=scale_factors[0],
width_scale=scale_factors[1],
data_format="channels_first",
)
if len(x.shape) == 5 and self.mode == "nearest":
return flow._C.upsample_nearest_3d(
x,
depth_scale=scale_factors[0],
height_scale=scale_factors[1],
width_scale=scale_factors[2],
data_format="channels_first",
)
if len(x.shape) == 3 and self.mode == "area":
assert output_size is not None
return flow._C.adaptive_avg_pool1d(x, output_size)
if len(x.shape) == 4 and self.mode == "area":
assert output_size is not None
return flow._C.adaptive_avg_pool2d(x, output_size)
if len(x.shape) == 5 and self.mode == "area":
assert output_size is not None
return flow._C.adaptive_avg_pool3d(x, output_size)
if len(x.shape) == 3 and self.mode == "linear":
assert self.align_corners is not None
return flow._C.upsample_linear_1d(
x,
scale_factor=scale_factors[0],
align_corners=self.align_corners,
data_format="channels_first",
)
if len(x.shape) == 4 and self.mode == "bilinear":
assert self.align_corners is not None
return flow._C.upsample_bilinear_2d(
x,
height_scale=scale_factors[0],
width_scale=scale_factors[1],
align_corners=self.align_corners,
data_format="channels_first",
)
if len(x.shape) == 4 and self.mode == "bicubic":
assert self.align_corners is not None
return flow._C.upsample_bicubic_2d(
x,
height_scale=scale_factors[0],
width_scale=scale_factors[1],
align_corners=self.align_corners,
data_format="channels_first",
)
if len(x.shape) == 5 and self.mode == "trilinear":
assert self.align_corners is not None
return flow._C.upsample_trilinear_3d(
x,
depth_scale=scale_factors[0],
height_scale=scale_factors[1],
width_scale=scale_factors[2],
align_corners=self.align_corners,
data_format="channels_first",
)
2
AlignCorners解释
align_corners
是一个非常重要的参数,这里我们先解释一下这个参数是什么含义再继续讲解每种Kernel的实现。这里以一张图片的nearest插值为例讲解align_corners的具体含义。int y=j*n/b;
int y=(j+0.5)*n/b-0.5;
interpolate
Op的实现中提供了align_corners
这个参数让用户选择是否对齐输入和输出的几何中心。3
Linear插值
https://github.com/Oneflow-Inc/oneflow/blob/master/oneflow/user/kernels/upsample_linear_1d_kernel.cpp
,我们只看前向,代码中的h1lambda
就对应了这个公式里面的。OF_DEVICE_FUNC T GetLinearInputIndex(const int64_t out_dim_idx, const T scale, bool align_corners) {
if (align_corners) {
return static_cast<T>(scale * out_dim_idx);
} else {
T src_idx = scale * (out_dim_idx + 0.5) - 0.5;
return static_cast<T>(src_idx < 0 ? 0 : src_idx);
}
}
static void UpsampleLinear1DForward(const int64_t elem_cnt, const T* in_dptr,
NdIndexOffsetHelper<int64_t, 3> in_helper,
NdIndexOffsetHelper<int64_t, 3> out_helper, const int in_height,
const float scale_factor, bool align_corners, T* out_dptr) {
for (int64_t index = 0; index < elem_cnt; ++index) {
int64_t n, c, h;
out_helper.OffsetToNdIndex(index, n, c, h);
const T h1r = GetLinearInputIndex(h, scale_factor, align_corners);
const int64_t h1 = h1r;
const int64_t h1p = (h1 < in_height - 1) ? 1 : 0;
const T h1lambda = h1r - h1;
const T h0lambda = static_cast<T>(1.) - h1lambda;
out_dptr[index] = h0lambda * in_dptr[in_helper.NdIndexToOffset(n, c, h1)]
+ h1lambda * in_dptr[in_helper.NdIndexToOffset(n, c, h1 + h1p)];
}
}
4
nearest插值
https://github.com/Oneflow-Inc/oneflow/blob/master/oneflow/user/kernels/upsample_nearest_kernel.cpp
,这里以输入Tensor为NCW为例代码如下:const int64_t in_dim_size) {
int64_t index = static_cast<int64_t>(std::floor((static_cast<float>(out_dim_idx) * scale)));
index = index > in_dim_size - 1 ? in_dim_size - 1 : index;
index = index < static_cast<int64_t>(0) ? static_cast<int64_t>(0) : index;
return index;
}
template<typename T>
static void UpsampleNearest1DForward(const int64_t elem_cnt, const T* in_dptr,
NdIndexOffsetHelper<int64_t, 3> in_helper,
NdIndexOffsetHelper<int64_t, 3> out_helper,
const int64_t in_height, const float scale_factor,
T* out_dptr) {
for (int64_t index = 0; index < elem_cnt; ++index) {
int64_t n, c, h;
out_helper.OffsetToNdIndex(index, n, c, h);
const int64_t in_h = GetNearestInputIndex(h, scale_factor, in_height);
out_dptr[index] = in_dptr[in_helper.NdIndexToOffset(n, c, in_h)];
}
}
6
bilinear插值
https://github.com/Oneflow-Inc/oneflow/blob/master/oneflow/user/kernels/upsample_bilinear_2d_kernel.cpp
,这里只看前向:OF_DEVICE_FUNC void GetBilinearParam(const bool align_corners, const int64_t h, const int64_t w,
const int64_t in_height, const int64_t in_width,
const T scale_h, const T scale_w, BilinearParam<T>* params) {
T h1r;
if (align_corners) {
h1r = scale_h * static_cast<T>(h);
} else {
h1r = (static_cast<T>(h) + 0.5f) * scale_h - 0.5f;
h1r = h1r < 0 ? 0 : h1r;
}
const int64_t h1 = h1r;
const int64_t h1p = (h1 < in_height - 1) ? 1 : 0;
T w1r;
if (align_corners) {
w1r = scale_w * static_cast<T>(w);
} else {
w1r = (static_cast<T>(w) + 0.5f) * scale_w - 0.5f;
w1r = w1r < 0 ? 0 : w1r;
}
const int64_t w1 = w1r;
const int64_t w1p = (w1 < in_width - 1) ? 1 : 0;
params->top_h_index = h1;
params->bottom_h_index = h1 + h1p;
params->h_lerp = h1r - h1;
params->left_w_index = w1;
params->right_w_index = w1 + w1p;
params->w_lerp = w1r - w1;
}
template<typename T>
static void UpsampleBilinear2DForward(const int64_t elem_cnt, const T* in_dptr,
NdIndexOffsetHelper<int64_t, 4> in_helper,
NdIndexOffsetHelper<int64_t, 4> out_helper,
const int64_t in_height, const int64_t in_width,
const T scale_h, const T scale_w, const bool align_corners,
T* out_dptr) {
for (int64_t index = 0; index < elem_cnt; ++index) {
int64_t n, c, h, w;
out_helper.OffsetToNdIndex(index, n, c, h, w);
BilinearParam<T> params;
GetBilinearParam(align_corners, h, w, in_height, in_width, scale_h, scale_w, ¶ms);
const int64_t top_offset = in_helper.NdIndexToOffset(n, c, params.top_h_index, 0);
const int64_t bottom_offset = in_helper.NdIndexToOffset(n, c, params.bottom_h_index, 0);
const T top_left = in_dptr[top_offset + params.left_w_index];
const T top_right = in_dptr[top_offset + params.right_w_index];
const T bottom_left = in_dptr[bottom_offset + params.left_w_index];
const T bottom_right = in_dptr[bottom_offset + params.right_w_index];
const T top = top_left + (top_right - top_left) * params.w_lerp;
const T bottom = bottom_left + (bottom_right - bottom_left) * params.w_lerp;
out_dptr[index] = top + (bottom - top) * params.h_lerp;
}
}
6
bicubic 插值
wiki:在数值分析这个数学分支中,双三次插值(英语:Bicubic interpolation)是二维空间中最常用的插值方法。在这种方法中,函数 f 在点 (x, y) 的值可以通过矩形网格中最近的十六个采样点的加权平均得到,在这里需要使用两个多项式插值三次函数,每个方向使用一个。
interpolate
这个算子时最复杂的一种插值方式,计算过程如下:// https://en.wikipedia.org/wiki/Bicubic_interpolation#Bicubic_convolution_algorithm
template<typename T>
OF_DEVICE_FUNC T cubic_convolution1(const T x, const T A) {
return ((A + 2.0) * x - (A + 3.0)) * x * x + 1.0;
}
template<typename T>
OF_DEVICE_FUNC T cubic_convolution2(const T x, const T A) {
return ((A * x - 5.0 * A) * x + 8.0 * A) * x - 4.0 * A;
}
template<typename T>
OF_DEVICE_FUNC void get_cubic_upsample_coefficients(T coeffs[4], const T t) {
T A = -0.75;
T x1 = t;
coeffs[0] = cubic_convolution2<T>(x1 + 1.0, A);
coeffs[1] = cubic_convolution1<T>(x1, A);
// opposite coefficients
T x2 = 1.0 - t;
coeffs[2] = cubic_convolution1<T>(x2, A);
coeffs[3] = cubic_convolution2<T>(x2 + 1.0, A);
}
template<typename T>
OF_DEVICE_FUNC T cubic_interp1d(const T x0, const T x1, const T x2, const T x3, const T t) {
T coeffs[4];
get_cubic_upsample_coefficients<T>(coeffs, t);
return x0 * coeffs[0] * 1.0 + x1 * coeffs[1] * 1.0 + x2 * coeffs[2] * 1.0 + x3 * coeffs[3] * 1.0;
}
const user_op::Tensor* x_tensor = ctx->Tensor4ArgNameAndIndex("x", 0);
user_op::Tensor* y_tensor = ctx->Tensor4ArgNameAndIndex("y", 0);
const T* in_ptr = x_tensor->dptr<T>();
T* out_ptr = y_tensor->mut_dptr<T>();
const float height_scale = ctx->Attr<float>("height_scale");
const float width_scale = ctx->Attr<float>("width_scale");
const bool align_corners = ctx->Attr<bool>("align_corners");
const int nbatch = x_tensor->shape().At(0);
const int channels = x_tensor->shape().At(1);
const int64_t in_height = x_tensor->shape().At(2);
const int64_t in_width = x_tensor->shape().At(3);
const int64_t out_height = y_tensor->shape().At(2);
const int64_t out_width = y_tensor->shape().At(3);
if (in_height == out_height && in_width == out_width) {
memcpy(out_ptr, in_ptr, sizeof(T) * nbatch * channels * in_height * in_width);
} else {
const T scale_height = GetAreaPixelScale(in_height, out_height, align_corners, height_scale);
const T scale_width = GetAreaPixelScale(in_width, out_width, align_corners, width_scale);
for (int64_t output_y = 0; output_y < out_height; output_y++) {
for (int64_t output_x = 0; output_x < out_width; output_x++) {
const T* in = in_ptr;
T* out = out_ptr;
const T real_x = GetAreaPixel(scale_width, output_x, align_corners, /*cubic=*/true);
int64_t input_x = std::floor(real_x);
const T t_x = real_x - input_x;
const T real_y = GetAreaPixel(scale_height, output_y, align_corners, /*cubic=*/true);
int64_t input_y = std::floor(real_y);
const T t_y = real_y - input_y;
for (int64_t c = 0; c < channels * nbatch; c++) {
T coefficients[4];
// Interpolate 4 times in the x direction
for (int64_t i = 0; i < 4; i++) {
coefficients[i] =
cubic_interp1d<T>(upsample_get_value_bounded<T>(in, in_width, in_height,
input_x - 1, input_y - 1 + i),
upsample_get_value_bounded<T>(in, in_width, in_height,
input_x + 0, input_y - 1 + i),
upsample_get_value_bounded<T>(in, in_width, in_height,
input_x + 1, input_y - 1 + i),
upsample_get_value_bounded<T>(in, in_width, in_height,
input_x + 2, input_y - 1 + i),
t_x);
}
// Interpolate in the y direction using x interpolations
out[output_y * out_width + output_x] = cubic_interp1d<T>(
coefficients[0], coefficients[1], coefficients[2], coefficients[3], t_y);
// Move to next channel
in += in_width * in_height;
out += out_width * out_height;
}
}
}
}
}
7
trilinear插值
https://github.com/Oneflow-Inc/oneflow/blob/master/oneflow/user/kernels/upsample_trilinear_3d_kernel.cpp#L25-L69
,这里只看前向:static void UpsampleTrilinear3DForward(const int64_t elem_cnt, const T* in_dptr,
NdIndexOffsetHelper<int64_t, 5> in_helper,
NdIndexOffsetHelper<int64_t, 5> out_helper,
const int64_t in_depth, const int64_t in_height,
const int64_t in_width, const T rdepth, const T rheight,
const T rwidth, const bool align_corners, T* out_dptr) {
for (int64_t index = 0; index < elem_cnt; ++index) {
int64_t n, c, d, h, w;
out_helper.OffsetToNdIndex(index, n, c, d, h, w);
const T t1r = GetAreaPixel(rdepth, d, align_corners);
const int64_t t1 = t1r;
const int64_t t1p = (t1 < in_depth - 1) ? 1 : 0;
const T t1lambda = t1r - t1;
const T t0lambda = static_cast<T>(1.) - t1lambda;
const T h1r = GetAreaPixel(rheight, h, align_corners);
const int64_t h1 = h1r;
const int64_t h1p = (h1 < in_height - 1) ? 1 : 0;
const T h1lambda = h1r - h1;
const T h0lambda = static_cast<T>(1.) - h1lambda;
const T w1r = GetAreaPixel(rwidth, w, align_corners);
const int64_t w1 = w1r;
const int64_t w1p = (w1 < in_width - 1) ? 1 : 0;
const T w1lambda = w1r - w1;
const T w0lambda = static_cast<T>(1.) - w1lambda;
const T* pos1 = &in_dptr[in_helper.NdIndexToOffset(n, c, t1, h1, w1)];
out_dptr[index] =
t0lambda
* (h0lambda * (w0lambda * pos1[0] + w1lambda * pos1[w1p])
+ h1lambda
* (w0lambda * pos1[h1p * in_width] + w1lambda * pos1[h1p * in_width + w1p]))
+ t1lambda
* (h0lambda
* (w0lambda * pos1[t1p * in_height * in_width]
+ w1lambda * pos1[t1p * in_height * in_width + w1p])
+ h1lambda
* (w0lambda * pos1[t1p * in_height * in_width + h1p * in_width]
+ w1lambda * pos1[t1p * in_height * in_width + h1p * in_width + w1p]));
}
}
8
area插值
interpolate
算子中还有一种插值方法,即area
插值,代码如下:area
,这样想比较好理解。关于adaptive_avg_pool的细节我就不讲了,思路就是枚举输出像素然后找到对应的输入像素的区域进行像素求和和取平均。感兴趣可以看一下OneFlow的具体实现:https://github.com/Oneflow-Inc/oneflow/blob/master/oneflow/user/kernels/adaptive_pool_cpu_kernel.cpp
9
插值方法比较
interpolate
Op的各种插值算法,从Nearest到BiLinear再到Bicubic,获得的结果越来越平滑,但计算的代价也相应的增大。OneFlow和Pytorch一样将基于这个实现各种Upsample Module。还需要说明的是上采样除了这个interpolate
中提到的方法还有反卷积方法,之前已经讲过了,这里就不重复补充。interpolate
算子的开发过程为例,梳理了深度学习框架中基本所有的插值方法,希望可以帮助到读者。