论文标题
时间将改变事物:关于社交媒体分类中动态语言理解的实证研究
Time Will Change Things: An Empirical Study on Dynamic Language Understanding in Social Media Classification
论文作者
论文摘要
语言特征在现实世界中的社交媒体环境中不断发展。因此,许多自然语言理解中的训练有素的模型(NLU)对看不见的特征的语义推断无效,因此可能会在动态性的恶化中挣扎。为了应对这一挑战,我们在动态设置中凭经验研究了社交媒体NLU,在该设置中,对过去的数据进行了培训并对未来进行测试。与常见的随机数据分开的静态设置相比,它更好地反映了现实的实践。为了进一步分析模型适应动态性,我们探讨了在训练模型后利用一些未标记数据的有用性。在实验中检查了基于自动编码和伪标记的无监督域适应基线的性能以及与它们结合的关节框架的性能。四个社交媒体任务的实质结果意味着不断发展的环境对分类准确性的普遍负面影响,同时自动编码和伪标记协作表明了动态性的最佳鲁棒性。
Language features are ever-evolving in the real-world social media environment. Many trained models in natural language understanding (NLU), ineffective in semantic inference for unseen features, might consequently struggle with the deteriorating performance in dynamicity. To address this challenge, we empirically study social media NLU in a dynamic setup, where models are trained on the past data and test on the future. It better reflects the realistic practice compared to the commonly-adopted static setup of random data split. To further analyze model adaption to the dynamicity, we explore the usefulness of leveraging some unlabeled data created after a model is trained. The performance of unsupervised domain adaption baselines based on auto-encoding and pseudo-labeling and a joint framework coupling them both are examined in the experiments. Substantial results on four social media tasks imply the universally negative effects of evolving environments over classification accuracy, while auto-encoding and pseudo-labeling collaboratively show the best robustness in dynamicity.