论文标题

重点是中文语法错误校正所需的

Focus Is What You Need For Chinese Grammatical Error Correction

论文作者

Ye, Jingheng, Li, Yinghui, Ma, Shirong, Xie, Rui, Wu, Wei, Zheng, Hai-Tao

论文摘要

中国语法误差校正(CGEC)旨在自动检测并纠正中文文本中包含的语法错误。从长远来看,研究人员将CGEC视为具有一定程度不确定性的任务,即,不语法的句子通常可能具有多个参考。但是,我们认为,尽管这是一个非常合理的假设,但对于这个时代的主流模型的智慧而言,它太苛刻了。在本文中,我们首先发现多个参考文献实际上并没有带来积极的收益来模型培训。相反,如果模型在培训过程中注意小但必不可少的数据,则对CGEC模型有益。此外,我们提出了一种称为OneTarget的简单而有效的训练策略,以提高CGEC模型的焦点能力,从而提高CGEC性能。广泛的实验和详细分析表明,我们发现的正确性以及我们提出的方法的有效性。

Chinese Grammatical Error Correction (CGEC) aims to automatically detect and correct grammatical errors contained in Chinese text. In the long term, researchers regard CGEC as a task with a certain degree of uncertainty, that is, an ungrammatical sentence may often have multiple references. However, we argue that even though this is a very reasonable hypothesis, it is too harsh for the intelligence of the mainstream models in this era. In this paper, we first discover that multiple references do not actually bring positive gains to model training. On the contrary, it is beneficial to the CGEC model if the model can pay attention to small but essential data during the training process. Furthermore, we propose a simple yet effective training strategy called OneTarget to improve the focus ability of the CGEC models and thus improve the CGEC performance. Extensive experiments and detailed analyses demonstrate the correctness of our discovery and the effectiveness of our proposed method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源