其他
【源头活水】MLP-Mixer 里隐藏的卷积
“问渠那得清如许,为有源头活水来”,通过前沿领域知识的学习,从其他研究领域得到启发,对研究问题的本质有更清晰的认识和理解,是自我提高的不竭源泉。为此,我们特别精选论文阅读笔记,开辟“源头活水”专栏,帮助你广泛而深入的阅读科研文献,敬请关注。
如果卷积核的尺寸大到包含了所有输入,以至于无法在输入上滑动,那么卷积就变成了全连接层 反过来,如果全连接层足够稀疏,后一层的每个神经元只跟前一层对应位置附近的少数几个神经元连接,并且这些连接的权重在不同的空间位置都相同,那么全连接层也就变成了卷积层。
一些更具体的例子可以参考 CS231N 这里的解释。
https://cs231n.github.io/convolutional-networks/
import torch
import torch.nn.functional as F
# i) non-overlapping patch projection
batch_size, height, width, in_channels = 32, 224, 224, 3
out_channels, patch_size = 8, 16
x = torch.randn(batch_size, in_channels, height, width)
w1 = torch.randn(out_channels, in_channels, patch_size, patch_size)
b1 = torch.randn(out_channels)
conv_out1 = F.conv2d(x, w1, b1, stride=(patch_size, patch_size))
print(conv_out1.size()) # [batch_size, out_channels, num_patches_per_column, num_patches_per_row]
x_mlp = x.view(batch_size, in_channels, height // patch_size, patch_size, width // patch_size, patch_size).\
permute(0, 2, 4, 1, 3, 5).reshape(batch_size, -1, in_channels * patch_size ** 2)
mlp_out1 = x_mlp @ w1.view(out_channels, -1).T + b1
print(mlp_out1.size()) # [batch_size, num_patches, out_channels]
print(torch.allclose(conv_out1.view(-1), mlp_out1.transpose(1, 2).reshape(-1), atol=1e-4))
可以看到,在对结果进行重新排列后(这一步繁琐但是意义不大,不展开讲了),conv_out1 和 mlp_out1 是相同的。
torch.Size([32, 8, 14, 14])
torch.Size([32, 196, 8])
True
# ii) cross-location/token-mixing step
in_channels = out_channels # Use previous outputs as current inputs
out_hidden_dim = 7 # `C` in the paper
x = torch.randn(batch_size, in_channels, height // patch_size, width // patch_size)
w2 = torch.randn(out_hidden_dim, 1, height // patch_size, width // patch_size)
b2 = torch.randn(out_hidden_dim)
# This is a depthwise conv with shared parameters
conv_out2 = F.conv2d(x, w2.repeat(in_channels, 1, 1, 1),
b2.repeat(in_channels), groups=in_channels)
print(conv_out2.size()) # [batch_size, in_channels * out_hidden_dim, 1, 1]
mlp_out2 = x.view(batch_size, in_channels, -1) @ w2.view(out_hidden_dim, -1).T + b2
print(mlp_out2.size()) # [batch_size, in_channels, out_hidden_dim], or [B, S, C] in the paper
print(torch.allclose(conv_out2.view(-1), mlp_out2.view(-1), atol=1e-4))
torch.Size([32, 56, 1, 1])
torch.Size([32, 8, 7])
True
# iii) channel-mixing step
out_channels = 28
x = torch.randn(batch_size, in_channels, height // patch_size, width // patch_size)
w3 = torch.randn(out_channels, in_channels, 1, 1)
b3 = torch.randn(out_channels)
# This is a pointwise conv
conv_out3 = F.conv2d(x, w3, b3)
print(conv_out3.size()) # [batch_size, out_channels, num_patches_per_column, num_patches_per_row]
mlp_out3 = x.permute(0, 2, 3, 1).reshape(-1, in_channels) @ w3.view(out_channels, -1).T + b3
print(mlp_out3.size()) # [batch_size * num_patches, out_channels], or [B*C, S] in the paper
print(torch.allclose(conv_out3.permute(0, 2, 3, 1).reshape(-1), mlp_out3.view(-1), atol=1e-4))
torch.Size([32, 28, 14, 14])
torch.Size([6272, 28])
True
写到这里,其实也就把 @Captain Jack 的一句话评价 parameter-shared depth-wise separable convolution 掰开讲了。
https://www.zhihu.com/question/457926000/answer/1871444516
本文目的在于学术交流,并不代表本公众号赞同其观点或对其内容真实性负责,版权归原作者所有,如有侵权请告知删除。
“源头活水”历史文章
深度学习结合传统几何的视觉定位方法:HSCNet简介
CVPR 2021 | 帮你理解域迁移:可视化网络知识的变化
视觉Transformer中的位置嵌入
多任务权重自动学习论文介绍和代码实现
Covariate Shift: 基于机器学习分类器的回顾和分析
NAS中基于MCT的搜索空间采样方法
LSNet: Anchor-free新玩法,只用一个head统一目标检测,实例分割,姿态估计三种任务
CV+Transformer之Swin Transformer
爆火的 Swin Transformer 到底做对了什么
mBART:多语言翻译预训练模型
NiN 论文阅读
强化学习论文阅读笔记:RODE
实例分割(SOLOv2|NIPS2020)——增强版SOLO
ICLR2021 | 显存不够?不妨抛弃端到端训练
更多源头活水专栏文章,
请点击文章底部“阅读原文”查看
分享、在看,给个三连击呗!