论文标题

RPT:暹罗视觉跟踪的学习点设置表示

RPT: Learning Point Set Representation for Siamese Visual Tracking

论文作者

Ma, Ziang, Wang, Linyuan, Zhang, Haitao, Lu, Wei, Yin, Jun

论文摘要

尽管在强大的视觉跟踪中取得了显着的进展,但准确的目标状态估计仍然是一个高度挑战的问题。在本文中,我们认为此问题与普遍的边界框表示密切相关,该框仅提供对象的粗空间范围。因此,提出了一个效率的视觉跟踪框架,以精确地估计目标状态,以较大的表示为一组代表点。训练了点集,以指示目标区域的语义和几何显着位置,从而实现了对象外观的更细粒度的定位和建模。我们进一步提出了一种多级聚合策略,以通过融合层次卷积层来获得详细的结构信息。对包括OTB2015,dot2018,dot2019和GoT-10k在内的几个具有挑战性的基准进行的广泛实验表明,我们的方法在20 fps的运行范围内实现了新的最先进的性能。

While remarkable progress has been made in robust visual tracking, accurate target state estimation still remains a highly challenging problem. In this paper, we argue that this issue is closely related to the prevalent bounding box representation, which provides only a coarse spatial extent of object. Thus an effcient visual tracking framework is proposed to accurately estimate the target state with a finer representation as a set of representative points. The point set is trained to indicate the semantically and geometrically significant positions of target region, enabling more fine-grained localization and modeling of object appearance. We further propose a multi-level aggregation strategy to obtain detailed structure information by fusing hierarchical convolution layers. Extensive experiments on several challenging benchmarks including OTB2015, VOT2018, VOT2019 and GOT-10k demonstrate that our method achieves new state-of-the-art performance while running at over 20 FPS.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源