RGB-T语义分割，位置，激活和锐化

论文标题

RGB-T语义分割，位置，激活和锐化

RGB-T Semantic Segmentation with Location, Activation, and Sharpening

论文作者

Li, Gongyang, Wang, Yike, Liu, Zhi, Zhang, Xinpeng, Zeng, Dan

论文摘要

语义细分对于场景理解很重要。为了解决自然图像不良照明条件的场景，引入了热红外（TIR）图像。大多数现有的RGB-T语义分割方法遵循三个跨模式融合范式，即编码器融合，解码器融合和特征融合。不幸的是，某些方法忽略了RGB和TIR功能的属性或不同级别的功能的属性。在本文中，我们提出了一个基于功能融合的新型网络，用于RGB-T语义分割，名为\ emph {lasnet}，遵循位置，激活和锐化的三个步骤。 LASNET的亮点是，我们完全考虑了不同级别的跨模式特征的特征，因此提出了三个特定的模块以进行更好的分割。具体而言，我们为高级语义功能提出了一个协作位置模块（CLM），旨在定位所有潜在对象。我们为中级特征提供了一个互补的激活模块，旨在激活不同对象的精确区域。我们为低级纹理特征提出了一个边缘锐化模块（ESM），旨在锐化物体的边缘。此外，在训练阶段，我们分别在CLM和ESM之后附加一个位置监督和边缘监督，并在解码器部分中强加了两个语义监督，以促进网络收敛。两个公共数据集的实验结果表明，我们的LASNET优于相关的最新方法。我们方法的代码和结果可在https://github.com/mathlee/lasnet上获得。

Semantic segmentation is important for scene understanding. To address the scenes of adverse illumination conditions of natural images, thermal infrared (TIR) images are introduced. Most existing RGB-T semantic segmentation methods follow three cross-modal fusion paradigms, i.e. encoder fusion, decoder fusion, and feature fusion. Some methods, unfortunately, ignore the properties of RGB and TIR features or the properties of features at different levels. In this paper, we propose a novel feature fusion-based network for RGB-T semantic segmentation, named \emph{LASNet}, which follows three steps of location, activation, and sharpening. The highlight of LASNet is that we fully consider the characteristics of cross-modal features at different levels, and accordingly propose three specific modules for better segmentation. Concretely, we propose a Collaborative Location Module (CLM) for high-level semantic features, aiming to locate all potential objects. We propose a Complementary Activation Module for middle-level features, aiming to activate exact regions of different objects. We propose an Edge Sharpening Module (ESM) for low-level texture features, aiming to sharpen the edges of objects. Furthermore, in the training phase, we attach a location supervision and an edge supervision after CLM and ESM, respectively, and impose two semantic supervisions in the decoder part to facilitate network convergence. Experimental results on two public datasets demonstrate that the superiority of our LASNet over relevant state-of-the-art methods. The code and results of our method are available at https://github.com/MathLee/LASNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题