论文标题
猫++:通过卷积和变压器提高成本汇总
CATs++: Boosting Cost Aggregation with Convolutions and Transformers
论文作者
论文摘要
成本汇总是图像匹配任务中非常重要的过程,该过程旨在消除嘈杂的匹配分数。现有方法通常通过手工制作或基于CNN的方法来解决此问题,这些方法要么缺乏严重变形的稳健性,要么继承了由于有限的接受场和不适应性而无法区分不正确匹配的CNN的限制。在本文中,我们介绍了变压器(CAT)的成本汇总,以通过一些建筑设计的帮助来探索最初的相关图之间的全球共识,从而使我们完全享受自我注意事务机制的全球接受领域。另外,为了减轻猫可能面临的一些局限性,即,使用标准变压器引起的高计算成本,其复杂性随空间和特征维度的大小而增长,这仅在有限的分辨率下限制了其适用性,并导致相当有限的性能,我们提出了猫++的猫,猫的扩展。我们提出的方法优于以前的最先进方法,这是大幅度的,为所有基准设置了新的最新方法,包括PF-Willow,PF-Pascal和Spair-71k。我们进一步提供大量的消融研究和分析。
Cost aggregation is a highly important process in image matching tasks, which aims to disambiguate the noisy matching scores. Existing methods generally tackle this by hand-crafted or CNN-based methods, which either lack robustness to severe deformations or inherit the limitation of CNNs that fail to discriminate incorrect matches due to limited receptive fields and inadaptability. In this paper, we introduce Cost Aggregation with Transformers (CATs) to tackle this by exploring global consensus among initial correlation map with the help of some architectural designs that allow us to fully enjoy global receptive fields of self-attention mechanism. Also, to alleviate some of the limitations that CATs may face, i.e., high computational costs induced by the use of a standard transformer that its complexity grows with the size of spatial and feature dimensions, which restrict its applicability only at limited resolution and result in rather limited performance, we propose CATs++, an extension of CATs. Our proposed methods outperform the previous state-of-the-art methods by large margins, setting a new state-of-the-art for all the benchmarks, including PF-WILLOW, PF-PASCAL, and SPair-71k. We further provide extensive ablation studies and analyses.