论文标题
通过身体姿势跟踪和可扩展的多元时间序列分类器的快速,强大的基于视频的锻炼分类
Fast and Robust Video-Based Exercise Classification via Body Pose Tracking and Scalable Multivariate Time Series Classifiers
论文作者
论文摘要
技术进步刺激了基于机器学习的运动科学中的应用。物理治疗师,体育教练和运动员积极寻求合并最新的技术,以进一步提高性能并避免受伤。虽然可穿戴传感器非常受欢迎,但它们的使用受到电池电源和传感器校准的限制的阻碍,尤其是对于需要将多个传感器放置在体内的用例。因此,人们对运动科学的基于视频的数据捕获和分析引起了新的兴趣。在本文中,我们介绍了使用视频对S \&C练习进行分类的应用。我们专注于流行的军事新闻练习,在该练习中,使用移动设备(例如手机)捕获了视频摄像机的执行,其目标是将执行分类为不同类型。由于视频记录需要大量的存储和计算,因此此用例需要减少数据,同时保留分类准确性并实现快速预测。为此,我们提出了一种名为BodyMTS的方法,通过采用身体姿势跟踪,然后使用多元时间序列分类器将视频转变为时间序列。我们分析了BodyMT的准确性和鲁棒性,并表明它对由视频质量或姿势估计因素引起的不同类型的噪声是鲁棒的。我们将BodyMT与最新的深度学习方法进行了比较,这些方法直接从视频中分类了人类活动,并表明BodyMTS实现了相似的精度,但跑步时间和模型工程工作减少。最后,我们讨论了在降低数据质量和大小下,在此应用程序中使用BodyMT的一些实际方面。我们表明,BodyMT的平均准确性为87 \%,这显着高于人类领域专家的准确性。
Technological advancements have spurred the usage of machine learning based applications in sports science. Physiotherapists, sports coaches and athletes actively look to incorporate the latest technologies in order to further improve performance and avoid injuries. While wearable sensors are very popular, their use is hindered by constraints on battery power and sensor calibration, especially for use cases which require multiple sensors to be placed on the body. Hence, there is renewed interest in video-based data capture and analysis for sports science. In this paper, we present the application of classifying S\&C exercises using video. We focus on the popular Military Press exercise, where the execution is captured with a video-camera using a mobile device, such as a mobile phone, and the goal is to classify the execution into different types. Since video recordings need a lot of storage and computation, this use case requires data reduction, while preserving the classification accuracy and enabling fast prediction. To this end, we propose an approach named BodyMTS to turn video into time series by employing body pose tracking, followed by training and prediction using multivariate time series classifiers. We analyze the accuracy and robustness of BodyMTS and show that it is robust to different types of noise caused by either video quality or pose estimation factors. We compare BodyMTS to state-of-the-art deep learning methods which classify human activity directly from videos and show that BodyMTS achieves similar accuracy, but with reduced running time and model engineering effort. Finally, we discuss some of the practical aspects of employing BodyMTS in this application in terms of accuracy and robustness under reduced data quality and size. We show that BodyMTS achieves an average accuracy of 87\%, which is significantly higher than the accuracy of human domain experts.