深入研究单眼3D对象检测的训练范式

论文标题

深入研究单眼3D对象检测的训练范式

Delving into the Pre-training Paradigm of Monocular 3D Object Detection

论文作者

Li, Zhuoling, Zhang, Chuanrui, Yu, En, Wang, Haoqian

论文摘要

单眼3D对象检测（M3OD）的标签昂贵。同时，通常在实际应用中存在许多未标记的数据，并且预培训是利用未标记数据中知识的有效方法。但是，几乎没有研究M3OD的预训练范例。我们的目标是在这项工作中弥合这一差距。为此，我们首先绘制两个观察结果：（1）设计预训练任务的指南是模仿目标任务的表示。（2）将深度估计和2D对象检测组合是有希望的M3OD预训练基线。之后，遵循指南，我们提出了几种策略来进一步改善该基线，该策略主要包括目标指导的半密度深度估计，关键点了解2D对象检测和类级损失调整。结合了所有开发的技术，获得的预训练框架会产生预训练的骨架，从而在Kitti-3D和Nuscenes基准中都显着提高M3OD性能。例如，通过将DLA34骨架应用于基于天真的中心的M3OD检测器，在Kitti-3D测试集上，中等$ {\ rm ap} _ {3D} 70 $汽车得分将增长18.71 \％\％\％，并且在NUSCENES验证设置上的NDS得分提高了40.41％的NUSCENES验证设置。

The labels of monocular 3D object detection (M3OD) are expensive to obtain. Meanwhile, there usually exists numerous unlabeled data in practical applications, and pre-training is an efficient way of exploiting the knowledge in unlabeled data. However, the pre-training paradigm for M3OD is hardly studied. We aim to bridge this gap in this work. To this end, we first draw two observations: (1) The guideline of devising pre-training tasks is imitating the representation of the target task. (2) Combining depth estimation and 2D object detection is a promising M3OD pre-training baseline. Afterwards, following the guideline, we propose several strategies to further improve this baseline, which mainly include target guided semi-dense depth estimation, keypoint-aware 2D object detection, and class-level loss adjustment. Combining all the developed techniques, the obtained pre-training framework produces pre-trained backbones that improve M3OD performance significantly on both the KITTI-3D and nuScenes benchmarks. For example, by applying a DLA34 backbone to a naive center-based M3OD detector, the moderate ${\rm AP}_{3D}70$ score of Car on the KITTI-3D testing set is boosted by 18.71\% and the NDS score on the nuScenes validation set is improved by 40.41\% relatively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题