论文标题

Fastkassim:基于快树核的句法相似性度量

FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

论文作者

Chen, Maximillian, Chen, Caitlyn, Yu, Xiao, Yu, Zhou

论文摘要

语法是语言的基本组成部分,但很少有指标被用来捕获语言和文档级别的句法相似性或连贯性。现有的标准文档级句法相似性度量在计算上是昂贵的,并且在面对语法上不同的文档时表现不一致。为了应对这些挑战,我们提出了Fastkassim,这是一种用于语音和文档级的句法相似性的度量,该指标将基于树核的两对文档之间配对和平均最相似的选区解析树。 Fastkassim对句法差异的效果更强大,并且在R/ChangemyView语料库中的文档的速度比其前身快的速度高5.32倍。 Fastkassim的改进使我们能够在具有大量文档的两个设置中检查假设。我们发现,关于R/ChangeMyView的句法类似论点往往更具说服力,并且语法可以预测澳大利亚高等法院判决语料库中的作者归因。

Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance- and document-level. The existing standard document-level syntactic similarity metric is computationally expensive and performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, a metric for utterance- and document-level syntactic similarity which pairs and averages the most similar constituency parse trees between a pair of documents based on tree kernels. FastKASSIM is more robust to syntactic dissimilarities and runs up to to 5.32 times faster than its predecessor over documents in the r/ChangeMyView corpus. FastKASSIM's improvements allow us to examine hypotheses in two settings with large documents. We find that syntactically similar arguments on r/ChangeMyView tend to be more persuasive, and that syntax is predictive of authorship attribution in the Australian High Court Judgment corpus.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源