具有功能和输出一致性培训的半监督建筑足迹生成

论文标题

具有功能和输出一致性培训的半监督建筑足迹生成

Semi-Supervised Building Footprint Generation with Feature and Output Consistency Training

论文作者

Li, Qingyu, Shi, Yilei, Zhu, Xiao Xiang

论文摘要

准确而可靠的建筑足迹图对于城市规划和监测至关重要，大多数现有方法都依靠卷积神经网络（CNN），用于建筑足迹生成。但是，这些方法的一个局限性是，它们需要大量注释的样本中的强大监督信息才能进行网络学习。一致性训练的最先进的半监督语义分割网络可以通过利用大量未标记的数据来帮助解决此问题，这鼓励了模型输出对数据扰动的一致性。考虑到丰富的信息也在特征地图中编码，我们建议将功能和输出的一致性集成到未标记样本的端到端网络培训中，从而实现其他约束。先前的半监督语义分割网络已经建立了群集假设，其中决策边界应位于低样本密度的附近。在这项工作中，我们观察到，对于构建足迹的生成，与编码器的输入或输出相比，在编码器内的中间特征表示下，低密度区域更为明显。因此，我们提出了一项指令，将扰动分配给编码器中的中间特征表示形式，该指示考虑了输入遥感图像的空间分辨率以及研究区域中单个建筑物的平均大小。在具有不同分辨率的三个数据集上评估所提出的方法：行星数据集（3 m/pixel），马萨诸塞州数据集（1 m/pixel）和Inria数据集（0.3 m/pixel）。实验结果表明，所提出的方法可以很好地提取更完整的建筑结构并减轻遗漏错误。

Accurate and reliable building footprint maps are vital to urban planning and monitoring, and most existing approaches fall back on convolutional neural networks (CNNs) for building footprint generation. However, one limitation of these methods is that they require strong supervisory information from massive annotated samples for network learning. State-of-the-art semi-supervised semantic segmentation networks with consistency training can help to deal with this issue by leveraging a large amount of unlabeled data, which encourages the consistency of model output on data perturbation. Considering that rich information is also encoded in feature maps, we propose to integrate the consistency of both features and outputs in the end-to-end network training of unlabeled samples, enabling to impose additional constraints. Prior semi-supervised semantic segmentation networks have established the cluster assumption, in which the decision boundary should lie in the vicinity of low sample density. In this work, we observe that for building footprint generation, the low-density regions are more apparent at the intermediate feature representations within the encoder than the encoder's input or output. Therefore, we propose an instruction to assign the perturbation to the intermediate feature representations within the encoder, which considers the spatial resolution of input remote sensing imagery and the mean size of individual buildings in the study area. The proposed method is evaluated on three datasets with different resolutions: Planet dataset (3 m/pixel), Massachusetts dataset (1 m/pixel), and Inria dataset (0.3 m/pixel). Experimental results show that the proposed approach can well extract more complete building structures and alleviate omission errors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题