论文标题
LIFI:迈向语言知情的框架插值
LIFI: Towards Linguistically Informed Frame Interpolation
论文作者
论文摘要
在这项工作中,我们探讨了语音视频框架插值的新问题。今天的这些内容构成了在线沟通的主要形式。我们尝试通过使用几种深度学习视频生成算法来生成缺失的帧来解决此问题。我们还提供了示例,尽管在传统的非语言指标上显示出高性能,但计算机视觉模型仍无法准确产生忠实的语音插值。有了这种动机,我们提供了一组专门针对语音视频插值问题的语言信息的指标。我们还发布了几个数据集,以测试其语音理解的计算机视频生成模型。
In this work, we explore a new problem of frame interpolation for speech videos. Such content today forms the major form of online communication. We try to solve this problem by using several deep learning video generation algorithms to generate the missing frames. We also provide examples where computer vision models despite showing high performance on conventional non-linguistic metrics fail to accurately produce faithful interpolation of speech. With this motivation, we provide a new set of linguistically-informed metrics specifically targeted to the problem of speech videos interpolation. We also release several datasets to test computer vision video generation models of their speech understanding.