论文标题
基于人眼的文本配色方案生成图像合成的方法
A Human Eye-based Text Color Scheme Generation Method for Image Synthesis
论文作者
论文摘要
用于场景文本检测和识别任务的合成数据已被证明有效。但是,仍然存在两个问题:首先,现有方法中用于文本着色的配色方案是从真实数据集中学到的相对固定的颜色键值对。真实数据集中的肮脏数据可能会导致一个问题,即文本和背景的颜色太相似,无法彼此区分。其次,生成的文本均匀地限于图片的相同深度,而现实世界中有特殊情况可能会出现在深度之间。为了解决这些问题,在本文中,我们设计了一种新的方法来生成配色方案,这与人眼的特征一致,以观察事物。我们方法的优点如下:(1)克服由脏数据引起的文本和背景之间的颜色混乱问题; (2)即使跨深度,也允许生成的文本出现在任何图像的大多数位置; (3)避免分析背景的深度,以使我们的方法的性能超过了最新的方法; (4)生成图像的速度很快,每三毫秒生成了几乎一张图像。在几个公共数据集上验证了我们方法的有效性。
Synthetic data used for scene text detection and recognition tasks have proven effective. However, there are still two problems: First, the color schemes used for text coloring in the existing methods are relatively fixed color key-value pairs learned from real datasets. The dirty data in real datasets may cause the problem that the colors of text and background are too similar to be distinguished from each other. Second, the generated texts are uniformly limited to the same depth of a picture, while there are special cases in the real world that text may appear across depths. To address these problems, in this paper we design a novel method to generate color schemes, which are consistent with the characteristics of human eyes to observe things. The advantages of our method are as follows: (1) overcomes the color confusion problem between text and background caused by dirty data; (2) the texts generated are allowed to appear in most locations of any image, even across depths; (3) avoids analyzing the depth of background, such that the performance of our method exceeds the state-of-the-art methods; (4) the speed of generating images is fast, nearly one picture generated per three milliseconds. The effectiveness of our method is verified on several public datasets.