密集连接的多重卷积网络，用于密集的预测任务

论文标题

密集连接的多重卷积网络，用于密集的预测任务

Densely connected multidilated convolutional networks for dense prediction tasks

论文作者

Takahashi, Naoya, Mitsufuji, Yuki

论文摘要

涉及高分辨率密集预测的任务需要在大输入字段中对本地和全局模式进行建模。尽管本地和全球结构通常相互依赖，并且它们的同时建模很重要，但许多卷积神经网络（CNN）基于基于不同分辨率的互换表示的方法仅几次。在本文中，我们声称对多分辨率表示的密集建模的重要性，并提出了一种新型的CNN体系结构，称为密集连接的多偏多义densenet（d3net）。 D3NET涉及一种新型的多重卷积，该卷积在单层中具有不同的扩张因子，以同时对不同的分辨率进行建模。通过将多重卷积与Densenet架构相结合，D3NET将多分辨率学习与几乎所有层中的成倍增长的接受场结合在一起，同时避免了当我们天真地将扩张的卷积纳入densenet中时发生的混溶性问题。使用CityScapes和使用MUSDB18的音频源分离任务进行图像语义分割任务的实验表明，该方法比最新方法具有优越的性能。

Tasks that involve high-resolution dense prediction require a modeling of both local and global patterns in a large input field. Although the local and global structures often depend on each other and their simultaneous modeling is important, many convolutional neural network (CNN)-based approaches interchange representations in different resolutions only a few times. In this paper, we claim the importance of a dense simultaneous modeling of multiresolution representation and propose a novel CNN architecture called densely connected multidilated DenseNet (D3Net). D3Net involves a novel multidilated convolution that has different dilation factors in a single layer to model different resolutions simultaneously. By combining the multidilated convolution with the DenseNet architecture, D3Net incorporates multiresolution learning with an exponentially growing receptive field in almost all layers, while avoiding the aliasing problem that occurs when we naively incorporate the dilated convolution in DenseNet. Experiments on the image semantic segmentation task using Cityscapes and the audio source separation task using MUSDB18 show that the proposed method has superior performance over state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题