论文标题

深入研究单眼3D对象检测的训练范式

Delving into the Pre-training Paradigm of Monocular 3D Object Detection

论文作者

Li, Zhuoling, Zhang, Chuanrui, Yu, En, Wang, Haoqian

论文摘要

单眼3D对象检测(M3OD)的标签昂贵。同时,通常在实际应用中存在许多未标记的数据,并且预培训是利用未标记数据中知识的有效方法。但是,几乎没有研究M3OD的预训练范例。我们的目标是在这项工作中弥合这一差距。为此,我们首先绘制两个观察结果:(1)设计预训练任务的指南是模仿目标任务的表示。 (2)将深度估计和2D对象检测组合是有希望的M3OD预训练基线。之后,遵循指南,我们提出了几种策略来进一步改善该基线,该策略主要包括目标指导的半密度深度估计,关键点了解2D对象检测和类级损失调整。结合了所有开发的技术,获得的预训练框架会产生预训练的骨架,从而在Kitti-3D和Nuscenes基准中都显着提高M3OD性能。例如,通过将DLA34骨架应用于基于天真的中心的M3OD检测器,在Kitti-3D测试集上,中等$ {\ rm ap} _ {3D} 70 $汽车得分将增长18.71 \%\%\%,并且在NUSCENES验证设置上的NDS得分提高了40.41%的NUSCENES验证设置。

The labels of monocular 3D object detection (M3OD) are expensive to obtain. Meanwhile, there usually exists numerous unlabeled data in practical applications, and pre-training is an efficient way of exploiting the knowledge in unlabeled data. However, the pre-training paradigm for M3OD is hardly studied. We aim to bridge this gap in this work. To this end, we first draw two observations: (1) The guideline of devising pre-training tasks is imitating the representation of the target task. (2) Combining depth estimation and 2D object detection is a promising M3OD pre-training baseline. Afterwards, following the guideline, we propose several strategies to further improve this baseline, which mainly include target guided semi-dense depth estimation, keypoint-aware 2D object detection, and class-level loss adjustment. Combining all the developed techniques, the obtained pre-training framework produces pre-trained backbones that improve M3OD performance significantly on both the KITTI-3D and nuScenes benchmarks. For example, by applying a DLA34 backbone to a naive center-based M3OD detector, the moderate ${\rm AP}_{3D}70$ score of Car on the KITTI-3D testing set is boosted by 18.71\% and the NDS score on the nuScenes validation set is improved by 40.41\% relatively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源