AutoAlignv2：动态多模式3D对象检测的可变形特征聚合

论文标题

AutoAlignv2：动态多模式3D对象检测的可变形特征聚合

AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection

论文作者

Chen, Zehui, Li, Zhenyu, Zhang, Shiquan, Fang, Liangji, Jiang, Qinhong, Zhao, Feng

论文摘要

点云和RGB图像是自动驾驶中的两个一般感知来源。前者可以提供准确的对象定位，而后者在语义信息方面更加浓密，更丰富。最近，AutoAlign提出了可学习的范式，以结合这两种用于3D对象检测的方式。但是，它遭受了全球关注所引入的高计算成本。为了解决问题，我们在这项工作中提出了跨域变形模块。它针对跨模式关系建模的稀疏可学习抽样点，这增强了对校准误差的耐受性，并大大加快了不同方式的特征聚合。为了在多模式设置下克服复杂的GT-EAG，我们设计了一个简单而有效的跨模式增强策略，鉴于其深度信息，图像贴片的凸组合。此外，通过执行新型的图像级辍学训练方案，我们的模型能够以动态的方式推断。为此，我们提出了AutoAlignv2，这是一个更快，更强的多模式3D检测框架，该框架构建在自动Autoalign之上。对Nuscenes基准测试的广泛实验证明了自动alignv2的有效性和效率。值得注意的是，我们的最佳模型在Nuscenes测试排行榜上达到了72.4 ND，在所有已发布的多模式3D对象探测器中都取得了新的最新结果。代码将在https://github.com/zehuichen123/autoalignv2上找到。

Point clouds and RGB images are two general perceptional sources in autonomous driving. The former can provide accurate localization of objects, and the latter is denser and richer in semantic information. Recently, AutoAlign presents a learnable paradigm in combining these two modalities for 3D object detection. However, it suffers from high computational cost introduced by the global-wise attention. To solve the problem, we propose Cross-Domain DeformCAFA module in this work. It attends to sparse learnable sampling points for cross-modal relational modeling, which enhances the tolerance to calibration error and greatly speeds up the feature aggregation across different modalities. To overcome the complex GT-AUG under multi-modal settings, we design a simple yet effective cross-modal augmentation strategy on convex combination of image patches given their depth information. Moreover, by carrying out a novel image-level dropout training scheme, our model is able to infer in a dynamic manner. To this end, we propose AutoAlignV2, a faster and stronger multi-modal 3D detection framework, built on top of AutoAlign. Extensive experiments on nuScenes benchmark demonstrate the effectiveness and efficiency of AutoAlignV2. Notably, our best model reaches 72.4 NDS on nuScenes test leaderboard, achieving new state-of-the-art results among all published multi-modal 3D object detectors. Code will be available at https://github.com/zehuichen123/AutoAlignV2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题