复活节：高效且可扩展的文本识别器

论文标题

复活节：高效且可扩展的文本识别器

EASTER: Efficient and Scalable Text Recognizer

论文作者

Chaudhary, Kartik, Bali, Raghav

论文摘要

深度学习的最新进展导致了表现出色的光学特征识别（OCR）系统的发展。大多数研究都围绕经常性网络以及复杂的封闭层，这些层使整体解决方案变得复杂且难以扩展。在本文中，我们提出了一个高效且可扩展的文本识别器（复活节），以在机器打印和手写文本上执行光学特征识别。我们的模型利用了1D卷积层而没有任何复发，可以进行平行训练，并具有较少的数据量。我们尝试了架构的多种变体，并且最小的变体之一（参数的深度和数量）与基于RNN的复杂选择相当。我们20层最深的变体的表现优于RNN体系结构，并在IIIT-5K和SVT（例如IIIT-5K和SVT）的基准测试数据集上具有良好的利润。我们还展示了离线手写文本识别任务上当前最佳结果的改进。我们还使用增强设置提供了数据生成管道，以生成用于手写和机器打印文本的合成数据集。

Recent progress in deep learning has led to the development of Optical Character Recognition (OCR) systems which perform remarkably well. Most research has been around recurrent networks as well as complex gated layers which make the overall solution complex and difficult to scale. In this paper, we present an Efficient And Scalable TExt Recognizer (EASTER) to perform optical character recognition on both machine printed and handwritten text. Our model utilises 1-D convolutional layers without any recurrence which enables parallel training with considerably less volume of data. We experimented with multiple variations of our architecture and one of the smallest variant (depth and number of parameter wise) performs comparably to RNN based complex choices. Our 20-layered deepest variant outperforms RNN architectures with a good margin on benchmarking datasets like IIIT-5k and SVT. We also showcase improvements over the current best results on offline handwritten text recognition task. We also present data generation pipelines with augmentation setup to generate synthetic datasets for both handwritten and machine printed text.

下载PDF全文

下载文献需遵守相关版权规定

论文标题