论文标题

电影叙事的概念:一个视频语言数据集用于故事理解

Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding

论文作者

Sun, Yidan, Chao, Qin, Ji, Yangfeng, Li, Boyang

论文摘要

尽管AI最近取得了进步,但故事理解仍然是一个开放和不足的问题。我们收集,预处理和公开发布视频语言故事数据集,电影叙事的概念(Symon),其中包含5,193个流行电影和电视连续剧的视频摘要,总长度为869小时。西蒙(Symon)捕捉了人类创作者制作的自然主义讲故事的视频,并旨在为人类观众制作。作为一个典型和自然主义的故事数据集,Symon具有多模式故事事件和丰富心理描述的高度覆盖。它对讲故事技术的使用会导致跨域语义差距,从而为现有模型带来适当的挑战。我们在电影摘要视频上建立了视频检索和零弹路的基准测试,该视频展示了内域数据和长期记忆在故事理解中的重要性。有了西蒙,我们希望为多模式故事理解的进步奠定基础。

Despite recent advances of AI, story understanding remains an open and under-investigated problem. We collect, preprocess, and publicly release a video-language story dataset, Synopses of Movie Narratives (SyMoN), containing 5,193 video summaries of popular movies and TV series with a total length of 869 hours. SyMoN captures naturalistic storytelling videos made by human creators and intended for a human audience. As a prototypical and naturalistic story dataset, SyMoN features high coverage of multimodal story events and abundant mental-state descriptions. Its use of storytelling techniques cause cross-domain semantic gaps that provide appropriate challenges to existing models. We establish benchmarks on video-text retrieval and zero-shot alignment on movie summary videos, which showcase the importance of in-domain data and long-term memory in story understanding. With SyMoN, we hope to lay the groundwork for progress in multimodal story understanding.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源