我们不仅仅是关节：预测3D身体如何移动

论文标题

我们不仅仅是关节：预测3D身体如何移动

We are More than Our Joints: Predicting how 3D Bodies Move

论文作者

Zhang, Yan, Black, Michael J., Tang, Siyu

论文摘要

理解人类行为的关键步骤是3D人类运动的预测。成功的解决方案在人类跟踪，HCI和图形中有许多应用。以前的大多数工作都集中在预测过去的3D关节位置的时间序列上，因为过去的序列3D接头。这种欧几里得公式通常比在关节旋转方面预测姿势更好。但是，人体关节位置并不能完全限制3D人类姿势，而使自由程度不确定，这使得很难只从关节中为现实的人类动画。请注意，3D接头可以视为稀疏点云。因此，人类运动预测的问题可以看作是点云预测。通过这种观察，我们相反，预测身体表面上的一组稀疏位置，与运动捕获标记相对应。给定这样的标记，我们拟合一个参数体模型来恢复人的3D形状和姿势。这些稀疏的表面标记还提供有关人类运动中不存在的详细信息，从而增加了预测动作的自然性。使用AMASS数据集，我们训练Mojo，这是一种新型的变异自动编码器，可从潜在频率产生运动。 Mojo保留了输入运动的完整时间分辨率，并从潜在频率中进行采样将高频组件明确引入生成的运动中。我们注意到，运动预测方法会随着时间的推移积累错误，从而导致与真正人体不同的关节或标记。为了解决这个问题，我们将SMPL-X拟合到每个时间步骤的预测，将解决方案投射回有效物体的空间。然后将这些有效标记在及时传播。实验表明，我们的方法会产生最先进的结果和现实的3D身体动画。用于研究目的的代码在https://yz-cnsdqz.github.io/mojo/mojo.html上

A key step towards understanding human behavior is the prediction of 3D human motion. Successful solutions have many applications in human tracking, HCI, and graphics. Most previous work focuses on predicting a time series of future 3D joint locations given a sequence 3D joints from the past. This Euclidean formulation generally works better than predicting pose in terms of joint rotations. Body joint locations, however, do not fully constrain 3D human pose, leaving degrees of freedom undefined, making it hard to animate a realistic human from only the joints. Note that the 3D joints can be viewed as a sparse point cloud. Thus the problem of human motion prediction can be seen as point cloud prediction. With this observation, we instead predict a sparse set of locations on the body surface that correspond to motion capture markers. Given such markers, we fit a parametric body model to recover the 3D shape and pose of the person. These sparse surface markers also carry detailed information about human movement that is not present in the joints, increasing the naturalness of the predicted motions. Using the AMASS dataset, we train MOJO, which is a novel variational autoencoder that generates motions from latent frequencies. MOJO preserves the full temporal resolution of the input motion, and sampling from the latent frequencies explicitly introduces high-frequency components into the generated motion. We note that motion prediction methods accumulate errors over time, resulting in joints or markers that diverge from true human bodies. To address this, we fit SMPL-X to the predictions at each time step, projecting the solution back onto the space of valid bodies. These valid markers are then propagated in time. Experiments show that our method produces state-of-the-art results and realistic 3D body animations. The code for research purposes is at https://yz-cnsdqz.github.io/MOJO/MOJO.html

下载PDF全文

下载文献需遵守相关版权规定

论文标题