对抗性语法错误校正

论文标题

对抗性语法错误校正

Adversarial Grammatical Error Correction

论文作者

Raheja, Vipul, Alikaniotis, Dimitrios

论文摘要

语法误差校正（GEC）的最新工作利用了神经机器翻译（NMT）的进度，以从平行的语法不正确和校正句子中学习重写，从而实现了最新的结果。同时，生成的对抗网络（GAN）通过学习直接最大程度地减少人类生成和合成文本之间的差异来成功地在许多不同的任务中生成现实文本。在这项工作中，我们使用Generator-Disciminator框架为GEC提出了一种对抗性学习方法。发电机是一种变压器模型，经过训练，可以在语法上产生语法正确的句子。歧视者是一个句子对分类模型，经过训练，可以判断给定的一对语法校正质量的语法错误校正句子。我们在并行文本上预先培训歧视器和生成器，然后使用策略梯度方法对它们进行进一步调整，该方法将高奖励分配给句子，这可能是对语法上错误的文本的真实更正。 FCE，CONLL-14和BEA-19数据集的实验结果表明，与基于NMT的基线相比，对抗性GEC可以实现竞争性GEC质量。

Recent works in Grammatical Error Correction (GEC) have leveraged the progress in Neural Machine Translation (NMT), to learn rewrites from parallel corpora of grammatically incorrect and corrected sentences, achieving state-of-the-art results. At the same time, Generative Adversarial Networks (GANs) have been successful in generating realistic texts across many different tasks by learning to directly minimize the difference between human-generated and synthetic text. In this work, we present an adversarial learning approach to GEC, using the generator-discriminator framework. The generator is a Transformer model, trained to produce grammatically correct sentences given grammatically incorrect ones. The discriminator is a sentence-pair classification model, trained to judge a given pair of grammatically incorrect-correct sentences on the quality of grammatical correction. We pre-train both the discriminator and the generator on parallel texts and then fine-tune them further using a policy gradient method that assigns high rewards to sentences which could be true corrections of the grammatically incorrect text. Experimental results on FCE, CoNLL-14, and BEA-19 datasets show that Adversarial-GEC can achieve competitive GEC quality compared to NMT-based baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题