使用自上而下和自下而上提示的表结构识别

论文标题

使用自上而下和自下而上提示的表结构识别

Table Structure Recognition using Top-Down and Bottom-Up Cues

论文作者

Raja, Sachin, Mondal, Ajoy, Jawahar, C. V.

论文摘要

表是文档图像中信息丰富的结构化对象。尽管在文档图像中将表格将表作为图形对象进行了重大工作，但在表结构识别中仅存在有限的尝试。关于结构识别的大多数现有文献取决于从PDF文档中提取元功能或光学特征识别（OCR）模型，以从图像中提取低级布局特征。但是，由于在表布局和文本组织存在显着差异时，这些方法由于缺乏元功能或错误而无法概括。在我们的工作中，我们专注于具有复杂结构，密集内容和变化的布局的表，而不受元功能和/或OCR的依赖。我们提出了一种用于表结构识别的方法，该方法结合了细胞检测和相互作用模块，以定位细胞并预测其行和列与其他检测到的细胞的关联。我们将结构约束作为细胞检测损耗函数的其他差分组成部分。我们从经验上验证了公开可用的现实世界数据集的方法-ICDAR-2013，ICDAR-2019（CTDAR）档案，UNLV，SCITSR，SCITSR，SCITSR-COMP，TACEBANK，TACEBANK和PUBTABNET。我们的尝试通过将自上而下的（表单元检测）和自下而上的（结构识别）提示组合在视觉上理解表中，从而为表结构识别提供了新的方向。

Tables are information-rich structured objects in document images. While significant work has been done in localizing tables as graphic objects in document images, only limited attempts exist on table structure recognition. Most existing literature on structure recognition depends on extraction of meta-features from the PDF document or on the optical character recognition (OCR) models to extract low-level layout features from the image. However, these methods fail to generalize well because of the absence of meta-features or errors made by the OCR when there is a significant variance in table layouts and text organization. In our work, we focus on tables that have complex structures, dense content, and varying layouts with no dependency on meta-features and/or OCR. We present an approach for table structure recognition that combines cell detection and interaction modules to localize the cells and predict their row and column associations with other detected cells. We incorporate structural constraints as additional differential components to the loss function for cell detection. We empirically validate our method on the publicly available real-world datasets - ICDAR-2013, ICDAR-2019 (cTDaR) archival, UNLV, SciTSR, SciTSR-COMP, TableBank, and PubTabNet. Our attempt opens up a new direction for table structure recognition by combining top-down (table cells detection) and bottom-up (structure recognition) cues in visually understanding the tables.

下载PDF全文

下载文献需遵守相关版权规定

论文标题