论文标题
通过自我监督的学习无监督的多模式视频对视频翻译
Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning
论文作者
论文摘要
现有的无监督视频对视频翻译方法无法产生翻译的视频,这些视频是符合框架现实,语义信息保存和视频级别一致的视频。在这项工作中,我们提出了一种新颖的无监督视频对视频翻译模型的Uvit。我们的模型分解了样式和内容,使用了专门的编码器 - 码头结构,并通过双向复发神经网络(RNN)单元传播框架间信息。样式包含分解机制使我们能够达到样式一致的视频翻译结果,并为我们提供了一个良好的界面,用于模态灵活翻译。此外,通过更改翻译中包含的输入框架和样式代码,我们提出了一个视频插值损失,该视频插值损失在序列中捕获了时间信息,以以自制的方式训练我们的构建块。我们的模型可以以多模式的方式产生照片现实的时空一致翻译视频。主观和客观的实验结果验证了我们模型比现有方法的优越性。更多详细信息可以在我们的项目网站上找到:https://uvit.netlify.com
Existing unsupervised video-to-video translation methods fail to produce translated videos which are frame-wise realistic, semantic information preserving and video-level consistent. In this work, we propose UVIT, a novel unsupervised video-to-video translation model. Our model decomposes the style and the content, uses the specialized encoder-decoder structure and propagates the inter-frame information through bidirectional recurrent neural network (RNN) units. The style-content decomposition mechanism enables us to achieve style consistent video translation results as well as provides us with a good interface for modality flexible translation. In addition, by changing the input frames and style codes incorporated in our translation, we propose a video interpolation loss, which captures temporal information within the sequence to train our building blocks in a self-supervised manner. Our model can produce photo-realistic, spatio-temporal consistent translated videos in a multimodal way. Subjective and objective experimental results validate the superiority of our model over existing methods. More details can be found on our project website: https://uvit.netlify.com