论文标题

AutoAlign:多模式3D对象检测的Pixel-Instance功能聚合

AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection

论文作者

Chen, Zehui, Li, Zhenyu, Zhang, Shiquan, Fang, Liangji, Jiang, Qinghong, Zhao, Feng, Zhou, Bolei, Zhao, Hang

论文摘要

通过RGB图像或LiDAR点云通过自主驾驶中广泛探索的对象检测。但是,使这两个数据源互补并且彼此有益是一项挑战。在本文中,我们建议\ textit {autoAlign},这是3D对象检测的自动功能融合策略。我们没有建立与摄像机投影矩阵的确定性对应关系,而是用可学习的对齐图对图像和点云之间的映射关系进行建模。该地图使我们的模型能够以动态和数据驱动的方式自动化非殖民特征的对齐。具体而言,设计了一个跨注意功能对齐模块,以适应每个体素的汇总\ textit {pixel-level}图像特征。为了增强功能对齐过程中的语义一致性,我们还设计了一个自我监视的跨模式交互模块,模型可以通过\ textit {instance-level}特征指导来学习特征聚合。广泛的实验结果表明,我们的方法可以分别在Kitti和Nuscenes数据集上进行2.3 MAP和7.0 MAP改进。值得注意的是,我们的最佳模型在Nuscenes测试排行榜上达到了70.9 ND,在各种最先进的情况下取得了竞争性能。

Object detection through either RGB images or the LiDAR point clouds has been extensively explored in autonomous driving. However, it remains challenging to make these two data sources complementary and beneficial to each other. In this paper, we propose \textit{AutoAlign}, an automatic feature fusion strategy for 3D object detection. Instead of establishing deterministic correspondence with camera projection matrix, we model the mapping relationship between the image and point clouds with a learnable alignment map. This map enables our model to automate the alignment of non-homogenous features in a dynamic and data-driven manner. Specifically, a cross-attention feature alignment module is devised to adaptively aggregate \textit{pixel-level} image features for each voxel. To enhance the semantic consistency during feature alignment, we also design a self-supervised cross-modal feature interaction module, through which the model can learn feature aggregation with \textit{instance-level} feature guidance. Extensive experimental results show that our approach can lead to 2.3 mAP and 7.0 mAP improvements on the KITTI and nuScenes datasets, respectively. Notably, our best model reaches 70.9 NDS on the nuScenes testing leaderboard, achieving competitive performance among various state-of-the-arts.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源