论文标题
部分可观测时空混沌系统的无模型预测
Semantic-aligned Fusion Transformer for One-shot Object Detection
论文作者
论文摘要
一击对象检测旨在仅根据给定的实例检测新对象。由于极端数据稀缺,当前的方法探索了各种特征融合,以获得直接转移的元知识。但是,他们的表演通常不令人满意。在本文中,我们将其归因于不适当的相关方法,这些方法通过忽略空间结构和规模差异而错过了查询支持语义。通过分析,我们利用注意力机制,并提出了一个简单但有效的架构,称为语义一致的融合变压器(SAFT)来解决这些问题。具体而言,我们为SAFT配备了垂直融合模块(VFM),用于跨尺度语义增强和用于跨样本特征融合的水平融合模块(HFM)。他们一起将每个特征点的视觉范围从询问中的全部增强特征金字塔扩大到促进语义一致的关联。在多个基准上进行的广泛实验证明了我们框架的优势。如果不进行新颖的课程进行微调,它将为一阶段的基线带来显着的性能增长,从而将最新的结果提高到更高的水平。
One-shot object detection aims at detecting novel objects according to merely one given instance. With extreme data scarcity, current approaches explore various feature fusions to obtain directly transferable meta-knowledge. Yet, their performances are often unsatisfactory. In this paper, we attribute this to inappropriate correlation methods that misalign query-support semantics by overlooking spatial structures and scale variances. Upon analysis, we leverage the attention mechanism and propose a simple but effective architecture named Semantic-aligned Fusion Transformer (SaFT) to resolve these issues. Specifically, we equip SaFT with a vertical fusion module (VFM) for cross-scale semantic enhancement and a horizontal fusion module (HFM) for cross-sample feature fusion. Together, they broaden the vision for each feature point from the support to a whole augmented feature pyramid from the query, facilitating semantic-aligned associations. Extensive experiments on multiple benchmarks demonstrate the superiority of our framework. Without fine-tuning on novel classes, it brings significant performance gains to one-stage baselines, lifting state-of-the-art results to a higher level.