论文标题
气动:空中3D人姿势和形状估计的多视图融合网络
AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation
论文作者
论文摘要
在这封信中,我们为非结构化的户外环境提供了一个新颖的无标记3D人体运动捕获(MOCAP)系统,该系统使用了一支由自动无人驾驶飞机(UAV)组成的团队,该系统带有车载RGB摄像机和计算。现有方法受校准摄像机和离线处理的限制。因此,我们提出了第一种使用多个外部未校准的飞行相机捕获的图像来估计人姿势和形状的方法(空调)。空调本身可以校准相对于人的摄像机,而不是依靠任何预校准。它使用在每个无人机上运行的分布式神经网络,这些神经网络相互传达与观点无关的信息(即,其3D形状和铰接式姿势)。使用SMPL-X身体模型对人的形状和姿势进行参数化,从而导致紧凑的表示,从而最大程度地减少了无人机之间的通信。使用现实的虚拟环境的合成图像对网络进行了训练,并在一小部分真实图像上进行了微调。我们还针对需要更高MOCAP质量的离线应用程序引入了一种基于优化的后处理方法(Airpose $^{+} $)。我们在https://github.com/robot-pection-group/airpose上提供方法的代码和数据可用于研究。一个描述方法和结果的视频可在https://youtu.be/xlye1tnhsfs上获得。
In this letter, we present a novel markerless 3D human motion capture (MoCap) system for unstructured, outdoor environments that uses a team of autonomous unmanned aerial vehicles (UAVs) with on-board RGB cameras and computation. Existing methods are limited by calibrated cameras and off-line processing. Thus, we present the first method (AirPose) to estimate human pose and shape using images captured by multiple extrinsically uncalibrated flying cameras. AirPose itself calibrates the cameras relative to the person instead of relying on any pre-calibration. It uses distributed neural networks running on each UAV that communicate viewpoint-independent information with each other about the person (i.e., their 3D shape and articulated pose). The person's shape and pose are parameterized using the SMPL-X body model, resulting in a compact representation, that minimizes communication between the UAVs. The network is trained using synthetic images of realistic virtual environments, and fine-tuned on a small set of real images. We also introduce an optimization-based post-processing method (AirPose$^{+}$) for offline applications that require higher MoCap quality. We make our method's code and data available for research at https://github.com/robot-perception-group/AirPose. A video describing the approach and results is available at https://youtu.be/xLYe1TNHsfs.