论文标题
多模式对象检测的弱对齐特征融合
Weakly Aligned Feature Fusion for Multimodal Object Detection
论文作者
论文摘要
为了在现实情况下实现准确,可靠的对象检测,并入了各种形式的图像,例如颜色,热和深度。但是,多模式数据通常会遇到位置移位问题,即,图像对并未严格对齐,使一个对象在不同的方式中具有不同的位置。对于深度学习方法,此问题使得很难融合多模式特征,并困惑卷积神经网络(CNN)培训。在本文中,我们提出了一个名为Aniged a区CNN(AR-CNN)的一般多模式检测器,以解决位置转移问题。首先,具有相似相似性约束的区域特征(RF)对齐模块旨在始终预测两种模态之间的位置移动,并自适应地对齐跨模式RF。其次,我们提出了一个新颖的感兴趣地区(ROI)抖动策略,以改善意外转移模式的鲁棒性。第三,我们提出了一种新的多模式融合方法,该方法选择了更可靠的功能,并通过功能重新加权抑制较少有用的功能。此外,通过在模式和建立关系中找到边界框,我们提供了名为Kaist Paired的新颖的多模式标签。对2-D和3-D对象检测,RGB-T和RGB-D数据集进行了广泛的实验,证明了我们方法的有效性和鲁棒性。
To achieve accurate and robust object detection in the real-world scenario, various forms of images are incorporated, such as color, thermal, and depth. However, multimodal data often suffer from the position shift problem, i.e., the image pair is not strictly aligned, making one object has different positions in different modalities. For the deep learning method, this problem makes it difficult to fuse multimodal features and puzzles the convolutional neural network (CNN) training. In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem. First, a region feature (RF) alignment module with adjacent similarity constraint is designed to consistently predict the position shift between two modalities and adaptively align the cross-modal RFs. Second, we propose a novel region of interest (RoI) jitter strategy to improve the robustness to unexpected shift patterns. Third, we present a new multimodal feature fusion method that selects the more reliable feature and suppresses the less useful one via feature reweighting. In addition, by locating bounding boxes in both modalities and building their relationships, we provide novel multimodal labeling named KAIST-Paired. Extensive experiments on 2-D and 3-D object detection, RGB-T, and RGB-D datasets demonstrate the effectiveness and robustness of our method.