构建一个漫画数据集“ Manga109”，并具有用于多媒体应用的注释

论文标题

构建一个漫画数据集“ Manga109”，并具有用于多媒体应用的注释

Building a Manga Dataset "Manga109" with Annotations for Multimedia Applications

论文作者

Aizawa, Kiyoharu, Fujimoto, Azuma, Otsubo, Atsushi, Ogawa, Toru, Matsui, Yusuke, Tsubota, Koki, Ikuta, Hikaru

论文摘要

漫画或漫画是一种多模式艺术品，由于缺乏适当的数据集，在最近的深度学习应用趋势中被抛弃了。因此，我们构建了Manga109，该数据集由各种日本漫画（94张作者和21,142页）组成，并通过获得作者许可以供学术使用而公开获得。我们仔细注释了框架，语音文字，角色面孔和角色身体；注释总数超过500K。该数据集提供了许多漫画图像和注释，这将有助于在机器学习算法及其评估中使用。除了学术用途外，我们还获得了数据集的一部分用于工业用途的许可。在本文中，我们描述了数据集的详细信息，并提供了应用现有深度学习方法的多媒体处理应用程序（检测，检索和生成）的一些示例，并由数据集成为可能。

Manga, or comics, which are a type of multimodal artwork, have been left behind in the recent trend of deep learning applications because of the lack of a proper dataset. Hence, we built Manga109, a dataset consisting of a variety of 109 Japanese comic books (94 authors and 21,142 pages) and made it publicly available by obtaining author permissions for academic use. We carefully annotated the frames, speech texts, character faces, and character bodies; the total number of annotations exceeds 500k. This dataset provides numerous manga images and annotations, which will be beneficial for use in machine learning algorithms and their evaluation. In addition to academic use, we obtained further permission for a subset of the dataset for industrial use. In this article, we describe the details of the dataset and present a few examples of multimedia processing applications (detection, retrieval, and generation) that apply existing deep learning methods and are made possible by the dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题