其他
吊打一切:YOLOv4的tricks汇总
The following article is from AI算法与图像处理 Author AI_study
来源 | AI算法与图像处理(ID:AI_study)
即使是目标检测在过去几年开始成熟,竞争仍然很激烈。如下所示,YOLOv4声称拥有最先进的精度,同时保持高处理帧速率。它在 MS COCO数据集上,使用Tesla V100以接近65 FPS推理速度获得精度43.5% AP (65.7% AP₅₀)。在目标检测中,高精度不再是唯一的基准。我们希望模型在边缘设备中平稳运行。如何用低成本的硬件对输入视频进行实时处理也变得非常重要。 https://medium.com/@jonathan_hui/yolov4-c9901eaa8e61
Backbone
Cross-Stage-Partial-connections (CSP)
CSPDarknet53
Neck
Feature Pyramid Networks (FPN)
SPP (spatial pyramid pooling layer)
YOLO with SPP
Path Aggregation Network (PAN)
Spatial Attention Module (SAM)
Bag of Freebies (BoF) for backbone
CutMix and Mosaic data augmentation, DropBlock regularization, and Class label smoothing
p = tf.placeholder(tf.float32, shape=[None, 10])
# Use 0.9 instead of 1.0.
feed_dict = {
p: [[0, 0, 0, 0.9, 0, 0, 0, 0, 0, 0]] # Image with label "3"
}
# logits_real_image is the logits calculated by
# the discriminator for real images.
d_real_loss = tf.nn.sigmoid_cross_entropy_with_logits(
labels=p, logits=logits_real_image)
Mish activation, Cross-stage partial connections (CSP), and Multi-input weighted residual connections (MiWRC)
(Swish activation function with different values of β)
The first layer is called a depthwise convolution, it performs lightweight filtering by applying a single convolutional filter per input channel. The second layer is a 1 × 1 convolution, called a pointwise convolution, which is responsible for building new features through computing linear combinations of the input channels. 第一层称为深度卷积( depthwise convolution),它通过对每个输入通道应用一个卷积滤波器来进行轻量级滤波。第二层是1×1卷积,称为pointwise convolution,它负责通过计算输入通道的线性组合来构建新的特征。
It is important to remove non-linearities in the narrow layersin order to maintain representational power. 为了保持代表性,在狭窄的层中消除非线性是很重要的。
Bag of Freebies (BoF) for detector
CIoU-loss, CmBN, DropBlock regularization, Mosaic data augmentation, Self-Adversarial Training, Eliminate grid sensitivity, Using multiple anchors for a single ground truth, Cosine annealing scheduler, Optimal hyperparameters, and Random training shapes
增加ground truth框与预测框的重叠区域, 使它们的中心点距离最小 保持框的高宽比的一致性。
Genetic algorithm used YOLOv3-SPP to train with GIoU loss and search 300 epochs for min-val 5k sets. We adopt searched learning rate 0.00261, momentum 0.949, IoU threshold for assigning ground truth 0.213, and loss normalizer 0.07 for genetic algorithm experiments. 遗传算法采用YOLOv3-SPP算法进行GIoU缺失训练,最小5k集搜索300个epoch。遗传算法实验采用搜索学习率0.00261,动量0.949,IoU阈值分配ground truth 0.213, loss normalizer 0.07。
Bag of Specials (BoS) for detector
Mish activation, modified SPP-block, modified SAM-block, modified PAN path-aggregation block & DIoU-NMS
Technology evaluated
1YOLOv4: Optimal Speed and Accuracy of Object Detection
2Github for YOLOv4:https://github.com/AlexeyAB/darknet
3Densely Connected Convolutional Networks
4CSPNet: A New Backbone that can Enhance Learning Capability of CNN
5Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
6Path Aggregation Network for Instance Segmentation
7Mish: A Self Regularized Non-Monotonic Neural Activation Function
8Searching for Activation Functions (Swish)
9DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection
10Path Aggregation Network for Instance Segmentation
11CBAM: Convolutional Block Attention Module (SAM)
12Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
13Cross-Iteration Batch Normalization
14CutMix: Regularization Strategy to Train Strong
15DropBlock: A regularization method for convolutional networks
16Rethinking the Inception Architecture for Computer Vision (Class label smoothing)
17Distance-IoU Loss: Faster and better learning for bounding box regression
18SGDR: Stochastic gradient descent with warm restarts (Cosine annealing scheduler)
19Bag of Freebies for Training Object Detection Neural Networks
20EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
21EfficientDet: Scalable and Efficient Object Detection
22MobileNetV2: Inverted Residuals and Linear Bottlenecks