论文标题
嘈杂的文本数据:阿喀琉斯的脚跟
Noisy Text Data: Achilles' Heel of BERT
论文作者
论文摘要
由于BERT在各种NLP任务和基准数据集上取得了惊人的成功,行业从业人员正在积极尝试微调BERT,以构建用于解决行业用例的NLP应用程序。对于大多数由从业者使用来构建工业NLP应用程序的数据集,很难确保数据中没有任何噪声。尽管伯特(Bert)在将一种用例转移到另一种用例中的表现非常好,但尚不清楚伯特(Bert)在嘈杂的文本上进行微调时的性能。在这项工作中,我们探讨了数据中BERT对噪声的敏感性。我们处理最常见的噪音(拼写错误,错别字),并表明这会导致BERT的表现显着退化。我们提出了实验结果,以表明伯特在基本NLP任务(例如情感分析和文本相似性)上的表现在基准数据集中(模拟)噪声的存在下显着下降。 IMDB电影评论,STS-B,SST-2。此外,我们确定了现有的BERT管道中的缺点,这些缺点是导致这种性能下降的原因。我们的发现表明,从业人员需要在数据集中有噪音的存在,同时微调BERT来解决行业用例。
Owing to the phenomenal success of BERT on various NLP tasks and benchmark datasets, industry practitioners are actively experimenting with fine-tuning BERT to build NLP applications for solving industry use cases. For most datasets that are used by practitioners to build industrial NLP applications, it is hard to guarantee absence of any noise in the data. While BERT has performed exceedingly well for transferring the learnings from one use case to another, it remains unclear how BERT performs when fine-tuned on noisy text. In this work, we explore the sensitivity of BERT to noise in the data. We work with most commonly occurring noise (spelling mistakes, typos) and show that this results in significant degradation in the performance of BERT. We present experimental results to show that BERT's performance on fundamental NLP tasks like sentiment analysis and textual similarity drops significantly in the presence of (simulated) noise on benchmark datasets viz. IMDB Movie Review, STS-B, SST-2. Further, we identify shortcomings in the existing BERT pipeline that are responsible for this drop in performance. Our findings suggest that practitioners need to be vary of presence of noise in their datasets while fine-tuning BERT to solve industry use cases.