论文标题

角色感知模型改善视觉文本渲染

Character-Aware Models Improve Visual Text Rendering

论文作者

Liu, Rosanne, Garrette, Dan, Saharia, Chitwan, Chan, William, Roberts, Adam, Narang, Sharan, Blok, Irina, Mical, RJ, Norouzi, Mohammad, Constant, Noah

论文摘要

当前的图像生成模型难以可靠地产生良好的视觉文本。在本文中,我们研究了一个关键因素:流行的文本到图像模型缺乏字符级输入功能,因此很难将单词的视觉构成视为一系列字形。为了量化这种效果,我们进行了一系列实验,以比较字符感知与字符盲文文本编码器。在仅文本域中,我们发现角色吸引的模型在新颖的拼写任务(Wikispell)方面提供了巨大的收益。将我们的学习应用于视觉域,我们训练了一套图像生成模型,并表明角色吸引的变体在一系列新颖的文本渲染任务(我们的绘制文本基准)上的角色盲的表现优于其角色盲的对应物。我们的模型在视觉拼写方面设定了更高的最先进,尽管在较少的示例中进行了训练,但在稀有单词上,竞争对手的精确度获得了30多点的准确性。

Current image generation models struggle to reliably produce well-formed visual text. In this paper, we investigate a key contributing factor: popular text-to-image models lack character-level input features, making it much harder to predict a word's visual makeup as a series of glyphs. To quantify this effect, we conduct a series of experiments comparing character-aware vs. character-blind text encoders. In the text-only domain, we find that character-aware models provide large gains on a novel spelling task (WikiSpell). Applying our learnings to the visual domain, we train a suite of image generation models, and show that character-aware variants outperform their character-blind counterparts across a range of novel text rendering tasks (our DrawText benchmark). Our models set a much higher state-of-the-art on visual spelling, with 30+ point accuracy gains over competitors on rare words, despite training on far fewer examples.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源