通过自我监督学习在视频中的一致3D手重建

论文标题

通过自我监督学习在视频中的一致3D手重建

Consistent 3D Hand Reconstruction in Video via self-supervised Learning

论文作者

Tu, Zhigang, Huang, Zhisheng, Chen, Yujin, Kang, Di, Bao, Linchao, Yang, Bisheng, Yuan, Junsong

论文摘要

我们提出了一种从单眼视频中重建准确且一致的3D手的方法。我们观察到检测到的2D手关键点，图像纹理为3D手的几何形状和纹理提供了重要的线索，这可以减少甚至消除3D手注释上的需求。因此，我们提出了$ {\ rm {s}^{2} hand} $，一种自我观察的3D手重建模型，可以通过轻松访问可访问的2D检测到的2D探测器的监督来估算单个RGB输入的姿势，形状，纹理和摄像机观点。我们利用未标记的视频数据中包含的连续手动运动信息，并提出$ {\ rm {s}^{2} {2} hand（v）} $，它使用一组共享的权重$ {\ rm {\ rm {s}^{2}^{2}手} $处理每个框架，并在每个框架上处理额外的运动，饰件，并促进纹理，并促进一致性和构造一致性并构成一致性的一致性，并准确地构成了一致性，并准确地构成了一致性，并准确地构成了一致性的范围。基准数据集上的实验表明，与最近在单帧中作为输入设置中的全面监督方法相比，我们的自我监督方法可以产生可比的手部重建性能，并且在使用视频训练数据时尤其提高了重建精度和一致性。

We present a method for reconstructing accurate and consistent 3D hands from a monocular video. We observe that detected 2D hand keypoints and the image texture provide important cues about the geometry and texture of the 3D hand, which can reduce or even eliminate the requirement on 3D hand annotation. Thus we propose ${\rm {S}^{2}HAND}$, a self-supervised 3D hand reconstruction model, that can jointly estimate pose, shape, texture, and the camera viewpoint from a single RGB input through the supervision of easily accessible 2D detected keypoints. We leverage the continuous hand motion information contained in the unlabeled video data and propose ${\rm {S}^{2}HAND(V)}$, which uses a set of weights shared ${\rm {S}^{2}HAND}$ to process each frame and exploits additional motion, texture, and shape consistency constrains to promote more accurate hand poses and more consistent shapes and textures. Experiments on benchmark datasets demonstrate that our self-supervised approach produces comparable hand reconstruction performance compared with the recent full-supervised methods in single-frame as input setup, and notably improves the reconstruction accuracy and consistency when using video training data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题