RPT：暹罗视觉跟踪的学习点设置表示

论文标题

RPT：暹罗视觉跟踪的学习点设置表示

RPT: Learning Point Set Representation for Siamese Visual Tracking

论文作者

Ma, Ziang, Wang, Linyuan, Zhang, Haitao, Lu, Wei, Yin, Jun

论文摘要

尽管在强大的视觉跟踪中取得了显着的进展，但准确的目标状态估计仍然是一个高度挑战的问题。在本文中，我们认为此问题与普遍的边界框表示密切相关，该框仅提供对象的粗空间范围。因此，提出了一个效率的视觉跟踪框架，以精确地估计目标状态，以较大的表示为一组代表点。训练了点集，以指示目标区域的语义和几何显着位置，从而实现了对象外观的更细粒度的定位和建模。我们进一步提出了一种多级聚合策略，以通过融合层次卷积层来获得详细的结构信息。对包括OTB2015，dot2018，dot2019和GoT-10k在内的几个具有挑战性的基准进行的广泛实验表明，我们的方法在20 fps的运行范围内实现了新的最先进的性能。

While remarkable progress has been made in robust visual tracking, accurate target state estimation still remains a highly challenging problem. In this paper, we argue that this issue is closely related to the prevalent bounding box representation, which provides only a coarse spatial extent of object. Thus an effcient visual tracking framework is proposed to accurately estimate the target state with a finer representation as a set of representative points. The point set is trained to indicate the semantically and geometrically significant positions of target region, enabling more fine-grained localization and modeling of object appearance. We further propose a multi-level aggregation strategy to obtain detailed structure information by fusing hierarchical convolution layers. Extensive experiments on several challenging benchmarks including OTB2015, VOT2018, VOT2019 and GOT-10k demonstrate that our method achieves new state-of-the-art performance while running at over 20 FPS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题