音乐得分图像的基于区域的布局分析

论文标题

音乐得分图像的基于区域的布局分析

Region-based Layout Analysis of Music Score Images

论文作者

Castellanos, Francisco J., Garrido-Munoz, Carlos, Ríos-Vila, Antonio, Calvo-Zaragoza, Jorge

论文摘要

布局分析（LA）阶段对于光学识别（OMR）系统的正确性能至关重要。它标识了感兴趣的区域，例如五线谱或歌词，然后必须对其进行处理才能转录其内容。尽管存在基于深度学习的现代方法，但OMR中LA的详尽研究尚未就不同模型的精度进行，它们对不同领域的概括，或者更重要的是，它们对管道后续阶段的影响。这项工作着重于通过对不同神经体系结构，音乐文档类型和评估场景的实验研究来填补文献中的这一空白。对培训数据的需求还导致了一项新的半合成数据生成技术的建议，该技术使LA方法在实际场景中有效地适用。我们的结果表明：（i）模型及其性能的选择对于整个转录过程至关重要；（ii）通常用于评估LA阶段的指标并不总是与OMR系统的最终性能相关，并且（iii）提出的数据生成技术使得通过有限的标记数据可以实现最先进的结果。

The Layout Analysis (LA) stage is of vital importance to the correct performance of an Optical Music Recognition (OMR) system. It identifies the regions of interest, such as staves or lyrics, which must then be processed in order to transcribe their content. Despite the existence of modern approaches based on deep learning, an exhaustive study of LA in OMR has not yet been carried out with regard to the precision of different models, their generalization to different domains or, more importantly, their impact on subsequent stages of the pipeline. This work focuses on filling this gap in literature by means of an experimental study of different neural architectures, music document types and evaluation scenarios. The need for training data has also led to a proposal for a new semi-synthetic data generation technique that enables the efficient applicability of LA approaches in real scenarios. Our results show that: (i) the choice of the model and its performance are crucial for the entire transcription process; (ii) the metrics commonly used to evaluate the LA stage do not always correlate with the final performance of the OMR system, and (iii) the proposed data-generation technique enables state-of-the-art results to be achieved with a limited set of labeled data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题