论文标题
SODAR:通过动态参与相邻的掩码表示来分割对象
SODAR: Segmenting Objects by DynamicallyAggregating Neighboring Mask Representations
论文作者
论文摘要
最新的最新一阶段实例分割模型独奏将输入图像划分为网格,并直接预测具有完全跨卷曲网络的每个网格细胞对象掩模,与传统的两阶段蒙版R-CNN相当良好的性能,但享受了更简单的体系结构和更高的效率。我们观察到独奏会在附近的网格单元上为对象生成相似的掩码,这些相邻的预测可以相互补充,因为有些人可以更好地分割某些对象部分,但是大多数部分被非最大抑制作用直接丢弃。在观察到的差距的激励下,我们开发了一种新颖的基于学习的聚合方法,该方法通过利用丰富的邻近信息在维持建筑效率的同时利用丰富的邻近信息来改善独奏。结果模型命名为Sodar。与原始的每个网格单元对象掩模不同,Sodar被隐式监督以学习掩码表示,以编码附近对象的几何结构并与上下文补充相邻表示形式。聚合方法进一步包括两个新颖的设计:1)掩模插值机制,该机制使模型能够通过在附近的网格单元中共享相邻表示,从而生成更少的掩码表示,从而节省了计算和内存; 2)一种可变形的邻居采样机制,该机制允许模型自适应地调整邻居采样位置,从而收集具有更相关的上下文的掩码表示并实现更高的性能。 SODAR显着提高了实例分割性能,例如,它在Coco \ texttt {test}设置上,用RESNET-101主链的独奏模型超过2.2 AP,只有大约3 \%的其他计算。我们进一步显示了SOLOV2模型的性能增长。
Recent state-of-the-art one-stage instance segmentation model SOLO divides the input image into a grid and directly predicts per grid cell object masks with fully-convolutional networks, yielding comparably good performance as traditional two-stage Mask R-CNN yet enjoying much simpler architecture and higher efficiency. We observe SOLO generates similar masks for an object at nearby grid cells, and these neighboring predictions can complement each other as some may better segment certain object part, most of which are however directly discarded by non-maximum-suppression. Motivated by the observed gap, we develop a novel learning-based aggregation method that improves upon SOLO by leveraging the rich neighboring information while maintaining the architectural efficiency. The resulting model is named SODAR. Unlike the original per grid cell object masks, SODAR is implicitly supervised to learn mask representations that encode geometric structure of nearby objects and complement adjacent representations with context. The aggregation method further includes two novel designs: 1) a mask interpolation mechanism that enables the model to generate much fewer mask representations by sharing neighboring representations among nearby grid cells, and thus saves computation and memory; 2) a deformable neighbour sampling mechanism that allows the model to adaptively adjust neighbor sampling locations thus gathering mask representations with more relevant context and achieving higher performance. SODAR significantly improves the instance segmentation performance, e.g., it outperforms a SOLO model with ResNet-101 backbone by 2.2 AP on COCO \texttt{test} set, with only about 3\% additional computation. We further show consistent performance gain with the SOLOv2 model.